Skip to content

Asynchronous progress design for a MPI-based PGAS one-sided communication system

Remote-memory-access models, also known as one-sided communication models, are becoming an interesting alternative to traditional two-sided communication models in the field of High Performance Computing. While remote-memory-access communication primitives are available in the latest version of the Message-Passing Interface, MPI-3, using it efficiently is challenging, in particular if one needs to exploit data-locality on hierarchical, distributed systems, and at the same time overlap computation and communication to hide communication latencies. In this paper we extend previous work on MPI-based and data-locality aware remote-memory-access models with a asynchronous progress-engine for non- blocking communication operations.
Most previous related works suggest to drive progression on communication through an additional thread within the application process. In contrast, our scheme uses an arbitrary number of dedicated processes (process-based approach) to drive asynchronous progression.
These dedicated process can serve a large number of application processes which are running on the same node and thus use resources more efficiently. Further, we describe a prototypical library implementation of our concepts, namely DART, which is used to quantitatively evaluate our design against a MPI-3 baseline reference. The evaluation consists of micro-benchmark to measure overlap of communication and computation and a scientific application kernel to assess total performance impact on realistic use-cases.