Improving concurrency and asynchrony in multithreaded MPI applications using software offloading
Reading group: Alexis Lescouet presented "Improving concurrency and asynchrony in multithreaded MPI applications using software offloading" (SC'05) at TSP/Palaiseau - 1D19 the 16/10/2020 at 11h00.
You can find the video of the presentation here and the slides here.
Abstract
We present a new approach for multithreaded communication and asynchronous progress in MPI applications, wherein we offload communication processing to a dedicated thread. The central premise is that given the rapidly increasing core counts on modern systems, the improvements in MPI performance arising from dedicating a thread to drive communication outweigh the small loss of resources for application computation, particularly when overlap of communication and computation can be exploited. Our approach allows application threads to make MPI calls concurrently, enqueuing these as communication tasks to be processed by a dedicated communication thread. This not only guarantees progress for such communication operations, but also reduces load imbalance. Our implementation additionally significantly reduces the overhead of mutual exclusion seen in existing implementations for applications using MPI THREAD MULTIPLE. Our technique requires no modification to the application, and we demonstrate significant performance improvement (up to 2X) for QCD, 1-D FFT and deep learning CNN applications.