Mesh deformation is a performance critical part of many problems in fluid dynamics. Radial basis function (RBF) interpolation based methods for mesh deformation have addressed the increasing complexity for larger data sets. Recently a domain decomposition method has been introduced which allows mapping these algorithms well to distributed memory systems. Because heterogeneous systems have proven to be more time and energy efficient for some applications, the HPC resources available to engineering users and researchers become increasingly heterogeneous.
In this paper, we describe two optimisations performed on a RBF based interpolation solver for mesh deformation. Motivated by a theoretical performance analysis, the existing MPI distributed model was extended to hybrid parallelisation with OpenMP to achieve better scaling efficiency on systems with hundreds of cores. In addition, an auto-tuning step at compile time which selects a threshold for code paths was introduced which yields up to twofold performance compared to a constant parameter approach across all test systems.
Our results indicate scaling efficiency in excess of 50–70 % for fully utilised dual socket systems of up to 96 cores, which is above the theoretical ideal performance of the baseline code. Utilisation of a single GPU improves time and energy to solution when a single CPU core is used, constraints of the applied domain decomposition which degrade performance when a single GPU is combined with many CPU cores are identified.