This deliverable describes the results generated in porting and tuning the applications considered as part of the co-design process in Mont-Blanc 3 for ARM. We present our optimization experiences for 9 applications. For most of them we performed evaluations on Mont-Blanc 3
platforms and analyses on both ARM and Intel-based platforms. Building on Deliverable 6.1 which tried to identify root causes of scaling issues, we continued with analysis and implemented optimizations to overcome these performance problems.
Overall load imbalance was identified as a very important issue. Most applications needed to address this either via improving algorithms and domain decompositions, distributing their ressources to fewer MPI ranks and more threads, or via dynamic load balancing (DLB).
For the HPCG benchmark, a performance analysis and an initial algorithmic optimization are presented. The work presented is not ARM-specic, but it has been tested on an ARM-based cluster by a team of students. Following the directive in , Lulesh has been ported to OmpSs and tested on the Mont-Blanc 3 mini-clusters.
In the ARM ecosystem, the ARM Performance Libraries have been evaluated on a widely used scientic suite QuantumESPRESSO. The results indicate speedups when using ARMPL for linear algebra workloads, and highlight opportunities to improve the FFT functions.
In addition, the recently released ARM compiler was compared to GCC. Performance and usability were comparable, further investigation which compiler is preferable for which type of workload is suggested.
For the applications in cardiac modelling and mesh deformation we generally find optimizations stemming from analysis on Intel systems advantageous for ARM systems and vice versa, e.g. work to scale to the high core density ThunderX system proved valuable for performance many core x86 systems. For some of them, power measurements are presented: these numbers will be used as baseline when comparing perfomance and power gures in the final Mont-Blanc 3 demonstrator under deployment in WP3.