(In Proceedings of the 31st IEEE International Parallel & Distributed Processing Symposium (IPDPS'17), presented in Orlando, FL, USA, IEEE, May 2017)
Abstract
The constantly increasing gap between communication and computation performance emphasizes the importance of communication-avoidance techniques. Caching is
a well-known concept used to reduce accesses to slow local
memories. In this work, we extend the caching idea to MPI-3 Remote Memory Access (RMA) operations. Here, caching
can avoid inter-node communications and achieve similar
benefits for irregular applications as communication-avoiding
algorithms for structured applications. We propose CLaMPI,
a caching library layered on top of MPI-3 RMA, to auto-
matically optimize code with minimum user intervention. We
demonstrate how cached RMA improves the performance of
a Barnes Hut simulation and a Local Clustering Coefficient
computation up to a factor of 1.8x and 5x, respectively. Due
to the low overheads in the cache miss case and the potential
benefits, we expect that our ideas around transparent RMA
caching will soon be an integral part of many MPI libraries.
Documents
download article:
Recorded talk (best effort)
BibTeX
@inproceedings{rma-caching, author={Salvatore Di Girolamo and F. Vella and Torsten Hoefler}, title={{Transparent Caching for RMA Systems }}, year={2017}, month={May}, booktitle={Proceedings of the 31st IEEE International Parallel \& Distributed Processing Symposium (IPDPS'17)}, location={Orlando, FL, USA}, publisher={IEEE}, source={http://www.unixer.de/~htor/publications/}, }