The need for more accurate or complete scientific results demands an increase in computing power to achieve Exascale machines, capable of performing 1018 floating point operations per second. One of the approaches to tackle Exascale computing is to increase the number of nodes in the machine, placing stronger demands in the system interconnect. The impact of the interconnection in HPC systems increases further with the surge of BigData applications, which have high network communication demands and different behavior than traditional HPC workloads, with a higher volume of communications and a more even distribution. Performance limitations can be reasonably expected to scale up with the network size, and some of them are likely to translate from system networks to the networks-on-chip within high-performance nodes.
This thesis introduces a synthetic traffic model of the communications in the Graph500 benchmark for BigData applications. The use of this traffic model simplifies the evaluation of data-intensive applications and their needs, and permits to predict the behavior in larger machines than currently available. An analysis of the benchmark communications shows a higher dependency with the network throughput than in traditional HPC applications.
Both BigData and HPC workloads can be significantly affected for the fairness in the network usage. This work conducts an analysis of the throughput fairness and evaluates the impact of different implicit and explicit fairness mechanisms. The fairness analysis and the evaluation of two proposed mechanisms have been performed through several synthetic traffic simulations in a 2-level hierarchical Dragonfly network with more than 15,000 nodes. Dragonflies are one of the high-radix, low-diameter network topologies proposed for Exascale system interconnects, and so far the only to have been implemented in a commercial system. A novel adversarial-consecutive traffic pattern is introduced for the evaluation of the throughput fairness, which particularly stresses the links in one of the routers of each group in the Dragonfly.
Results with the synthetic traffic model prove a significant constraint from the network throughput. They also evidence the existence of different communication distributions depending on the number of processes and their mapping to the network nodes. Throughput unfairness can limit average performance figures and even lead to starvation at those routers that become a bottleneck under adversarial traffic scenarios.
Prioritizing in-transit traffic over new injections favours network drainage and reduces network congestion but is disadvantageous with adaptive routing, because it prevents injection from bottleneck routers and aggravates throughput unfairness. Two mechanisms are proposed to improve network performance and simplify the router implementation. The first mechanism is the improvement in the detection of adversarial traffic patterns through contention information, using a metric based on contention counters. Four different implementations relying in this metric are evaluated.
Evaluation results show that the use of contention counters provides competitive performance and much faster adaption to traffic changes, avoiding routing oscillations typical of congestion-based adaptive routing mechanisms.
The second proposal is a novel mechanism denoted FlexVC, which relaxes virtual channel restrictions in the deadlock avoidance mechanism. FlexVC reduces the number of buffers required, and provides a more balanced use. It also allows to employ more resources than strictly required by the routing and deadlock avoidance mechanisms, in order to provide higher performance. FlexVC improves performance with all routing mechanisms under each of the traffic patterns evaluated. Simulation results indicate that the performance benefits of FlexVC remain similar or improve when adaptive routing is used instead of oblivious routing, and FlexVC saves more resources with adaptive routing than with oblivious routing. FlexVC can be combined with contention counters to improve the identification of traffic scenarios with in-transit adaptive routing, achieving the best overall performance while halving the number of buffers required in the router.