S. Pellegrini, T. Hoefler, T. Fahringer:

 On the Effects of CPU Caches on MPI Point-to-Point Communications

(In Proceedings of the 2012 IEEE International Conference on Cluster Computing, presented in Beijing, China, pages 495--503, IEEE Computer Society, ISBN: 978-0-7695-4807-4, Sep. 2012)


Several researchers investigated the placing of communication calls in message-passing parallel codes. The current rule of thumb it to maximize communication/computation overlap with early binding. In this work, we demonstrate that this is not the only design constraint because CPU caches can have a significant impact on communications. We conduct an empirical study of the interaction between CPU caching and communications for several different communication scenarios. We use the gained insight to formulate a set of intuitive rules for communication call placement and show how our rules can be applied to practical codes. Our optimized codes show an improvement of up to 80% for a simple stencil code. Our work is a first step towards communication optimizations by moving communication calls. We expect that future communication-aware compilers will use our insights as a standard technique so move communication calls in order to optimize performance.


  author={S. Pellegrini and T. Hoefler and T. Fahringer},
  title={{On the Effects of CPU Caches on MPI Point-to-Point Communications}},
  booktitle={Proceedings of the 2012 IEEE International Conference on Cluster Computing},
  location={Beijing, China},
  publisher={IEEE Computer Society},

