Skip to content

MUSA: A Multi-Level Simulation Infrastructure for Next-Generation HPC Machines

The complexity of High Performance Computing (HPC) systems is increasing in the number of components and their heterogeneity. Interactions between software and hardware involve many different aspects which are typically not transparent to scientific programmers and system architects. Therefore, predicting the behavior of current scientific applications on future HPC infrastructures is a challenging task.

In this paper we present MUSA, an end-to-end methodology that employs a multi-level simulation infrastructure. By combining different levels of abstraction, MUSA is able to model the communication network, microarchitectural details and system software interactions, providing different trade-offs in terms of simulation cost and accuracy. We compare detailed MUSA simulations with native executions of up to 2,048 cores and find relative errors that are within 10% in the common case. In addition, we use MUSA to simulate up to 16,384 cores and successfully identify scalability bottlenecks due to different factors, e.g. memory contention or load imbalance. We also compare different system configurations, showing how MUSA can help system designers to assess the usefulness of future technologies in next-generation HPC machines.