For this paper Joseph Schuchart from HLRS won the BEST PAPER AWARD at IWOMP 2018, the international workshop on OPENMP
The OpenMP tasking directives promise to help expose a larger degree of concurrency to the runtime than traditional worksharing constructs, which is especially useful for irregular applications. In combination with process-based parallelization such as MPI, the taskyield construct in OpenMP can become a crucial aspect as it helps to hide communication latencies by allowing a thread to execute another task while the communication operation is pending. Unfortunately, the OpenMP standard only provides little guarantees on the characteristics of the taskyield operation. In this paper, we explore different potential implementations of taskyield and present a portable blackbox tool for determining the actual implementation of any OpenMP compiler/runtime. Futhermore, we discuss the impact of the different taskyield implementations on the task design of the communication-heavy Blocked Cholesky Factorization and the difference in performance that can be observed, which we found to be over 20 %.