Skip to content

Extending commodity OpenFlow switches for large-scale HPC deployments

Commodity Ethernet networks are used in many HPC systems. Extensions based on OpenFlow have been proposed for large HPC deployments, considering scalability and power consumption concerns. Such designs employ low-diameter topologies to minimize power consumption, such as Flattened Butterflies or Dragonflies. However, these topologies require non-minimal adaptive routing to deal with varying traffic characteristics and avoid pathological behaviors. The solutions to this issue in previous work relies on Ethernet Pauses to adapt minimal or non-minimal routing, depending on the availability (Pause status) of each corresponding output port. Nevertheless, such design provides an undesired high average latency under adversarial traffic patterns and a reduction in peak throughput under uniform traffic. This paper identifies the causes of the issues presented above, and presents a preliminary study of alternative solutions based on exploiting commodity congestion notification messages (QCN, 802.1Qau), currently available in Datacenter switches. This work presents the main differences between a congestion control mechanism such as QCN, which performs injection throttling reducing average network load, and an adaptive routing mechanism, which diverts traffic away from the congested area but increases average network load. In particular, it identifies the difficulty of separating the cases of uniform traffic at saturation and adversarial traffic at low loads.