Life would be so much easier if only we had the source code...
Home -> Publications
Home
  Publications
    
all years
    2017
    2016
    2015
    2014
    2013
    2012
    2011
    2010
    2009
    2008
    2007
    2006
    2005
    2004
    theses
    techreports
    presentations
    edited volumes
    conferences
  Awards
  Research
  Teaching
  BLOG
  Miscellaneous
  Full CV [pdf]






  Events








  Past Events





Publications of Torsten Hoefler
Copyright Notice:

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

T. Hoefler, T. Schneider and A. Lumsdaine:

 Characterizing the Influence of System Noise on Large-Scale Applications by Simulation

(In International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10), Nov. 2010)
SC10 Best Paper Award

Abstract

Although system noise is increasingly a concern as HPC systems continue to grow in scale, existing studies with artificial noise models provide only limited insight into application behavior. This paper presents an in-depth analysis of the impact of system noise on large-scale parallel application performance in realistic settings. Our analytical model shows the particular circumstances under which noise is propagated or absorbed. The model shows that not only collective operations but also point-to-point communications influence the application's sensitivity to noise. We present a simulation toolchain that injects noise delays from traces gathered on four common large-scale architectures into a LogGPS simulation and allows new insights into the scaling of applications in noisy environments. We investigate collective operations with up to 1 million processes and three applications (Sweep3D, AMG, and POP) with up to 32.000 processes. We show that the scale at which noise becomes a bottleneck is system-specific and depends on the structure of the noise. Simulations with different network speeds show that a 10x faster network does not improve application scalability because noise becomes a bottleneck at scale. We quantify this noise bottleneck and conclude that our tools can be utilized to tune the noise signatures of a specific system for minimal noise propagation. For example, our simulations verify the long-standing conjecture that co-scheduling prevents significant application slowdown.

Documents

download article:
download slides:
 

BibTeX

@inproceedings{hoefler-noise-sim,
  author={T. Hoefler and T. Schneider and A. Lumsdaine},
  title={{Characterizing the Influence of System Noise on Large-Scale Applications by Simulation}},
  year={2010},
  month={Nov.},
  booktitle={International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10)},
  source={http://www.unixer.de/~htor/publications/},
}

serving: 54.162.76.55:39014© Torsten Hoefler