Felix, qui, quod amat, defendere fortiter audet
Home -> Publications
Home
  Publications
    
edited volumes
  Awards
  Research
  Teaching
  Miscellaneous
  Full CV [pdf]
  BLOG






  Events








  Past Events





Publications of Torsten Hoefler
Torsten Hoefler, Timo Schneider and Andrew Lumsdaine:

 Accurately Measuring Overhead, Communication Time and Progression of Blocking and Nonblocking Collective Operations at Massive Scale

(International Journal of Parallel, Emergent and Distributed Systems. Vol 25, Nr. 4, pages 241-258, Taylor & Francis Group, ISSN: 1744-5779, Jul. 2010)

Abstract

Accurate, reproducible and comparable measurement of the overheads, communication times and progression behavior of blocking and nonblocking collective operations is a complicated task. Although Different measurement schemes for blocking collective operations are implemented in well-known benchmarks, many of these schemes introduce different systematic errors in their measurements. We characterize these errors and select a window-based approach as the most accurate method. However, this approach complicates measurements significantly and introduces clock synchronization as a new source of errors. We analyze approaches to avoid or correct those errors and develop a scalable synchronization scheme to conduct benchmarks on massively parallel systems. Our results are compared to the window-based scheme implemented in the SKaMPI benchmarks and show a reduction of the synchronization overhead by a factor of 16 on 128 processes. We also describe two different measurement schemes for the overhead and asynchronous progress of nonblocking collective communications. An implementation and results of both measurement schemes are presented.

Documents

download article:
 

BibTeX

@article{hoefler-collmea,
  author={Torsten Hoefler and Timo Schneider and Andrew Lumsdaine},
  title={{Accurately Measuring Overhead, Communication Time and Progression of Blocking and Nonblocking Collective Operations at Massive Scale}},
  journal={International Journal of Parallel, Emergent and Distributed Systems},
  year={2010},
  month={Jul.},
  pages={241-258},
  volume={25},
  number={4},
  publisher={Taylor \& Francis Group},
  issn={1744-5779},
  source={http://www.unixer.de/~htor/publications/},
}


serving: 18.116.40.151:25657© Torsten Hoefler