Skip to content

On the Use of Commodity Ethernet Technology in Exascale HPC Systems

Exascale systems will require large networks with hundreds of thousands of endpoints. Ethernet technology is employed in a significant fraction of the Top500 systems, and will remain as a cost-effective alternative for HPC interconnection. However, its current design is not scalable to Exascale systems. Different solutions have been proposed for scalable Ethernet fabrics for data center, but not specifically for HPC applications. This work identifies the major differences in network requirements from both environments. Based on them, it studies the application of Ethernet to Exascale HPC systems, considering the topology, routing, forwarding table management, and address assignment, with a focus on performance and power. Our scalability solution relies on OpenFlow switches to implement hierarchical MAC addressing with the introduction of compaction mechanisms for TCAM table reduction. To simplify deployment, a protocol denoted DMP performs automated address assignment without interfering with layer-2 service announcement protocols. An analysis of latency requirements of HPC applications shows that their communication phases are very short, making controller-centric adaptive routing unfeasible. We introduce Conditional OpenFlow rules as an instrument which allows for adaptive routing with proactive rule instantiation. Routing decisions are taken in the switch depending on network status, without controller interaction. This mechanism supports multiple topologies which require minimal or nonminimal adaptive routing and improve performance and power. Altogether, this work introduces a realistic and competitive implementation of a scalable lossless Ethernet network for Exascale-level HPC environments, considering low-diameter and low-power topologies such as Flattened Butterflies or Dragonflies, and allowing for power savings up to 54%.