Netgauge - LogP, LogGP, and LogGPS Measurement

Discamus continentiam augere, luxuriam coercere

Home -> Research -> Netgauge -> LogGPS

  Publications
  Awards
  Research
    NB Collectives
    MPI Topologies
    MPI Datatypes
    Netgauge
      LogGPS
      OS Noise
      eBB
    Network Topologies
    Ethernet BTL eth
    ORCS
    DFSSSP
    Older Projects
    cDAG
    LogGOPSim
    CoMPIler
  Teaching

  Full CV [pdf]
  BLOG
  bio

Events

Past Events

Netgauge - LogP, LogGP, and LogGPS Measurement

Netgauge LogGPS (LogP, LogGP) Measurement Description: The loggp pattern in Netgauge allows the precise measurement of LogP [2], LogGP [3], and LogGPS [4] parameters of MPI implementations. Only MPI is supported right now, but the required modifications to support different modules (e.g., TCP, UDP) should be minimal. The loggp pattern employs the techniques described in Low-Overhead LogGP Parameter Assessment for Modern Interconnection Networks [1].

General The benchmarks use Netgauge's high-performance timers for different architectures. Users should make sure that the configure script detected the timer correctly and that it works reliably (no frequency scaling etc.). The benchmark results can be hard to interpret due to the used measurement method. We decided against returning single parameters because that could easily lead to wrong results or even negative parameters. Instead, the benchmark returns results for each message size (as it proceeds) and the quality of the fitting. A possible invocation would be: $ mpirun -n 2 ./netgauge -s 1-8192 -x loggp # Info: (0): Netgauge v2.1 MPI enabled (P=2) (./netgauge -s 1-8192 -x loggp ) # initializing x86-64 timer (takes some seconds) # Info: (0): Warming module mpi up ... this may take a while Testing 1 bytes 100 times: L=0.6128 s=1 o_s=0.266 o_r=0.547 g=nan G=nan (nan GiB/s) lsqu(g,G)=nan Testing 1025 bytes 100 times: L=0.6128 s=1025 o_s=0.458 o_r=1.163 g=0.448 G=0.000583 (13.405 GiB/s) lsqu(g,G)=inf Testing 2049 bytes 100 times: L=0.6128 s=2049 o_s=0.528 o_r=1.542 g=0.483 G=0.000478 (16.349 GiB/s) lsqu(g,G)=0.0877 Testing 3073 bytes 100 times: L=0.6128 s=3073 o_s=0.622 o_r=1.806 g=0.534 G=0.000404 (19.338 GiB/s) lsqu(g,G)=0.1156 Testing 4097 bytes 100 times: L=0.6128 s=4097 o_s=0.714 o_r=1.867 g=0.612 G=0.000328 (23.820 GiB/s) lsqu(g,G)=0.1706 Testing 5121 bytes 100 times: L=0.6128 s=5121 o_s=0.781 o_r=2.050 g=0.689 G=0.000272 (28.745 GiB/s) lsqu(g,G)=0.2028 Testing 6145 bytes 100 times: L=0.6128 s=6145 o_s=0.869 o_r=2.190 g=0.749 G=0.000236 (33.063 GiB/s) lsqu(g,G)=0.2127 Testing 7169 bytes 100 times: L=0.6128 s=7169 o_s=0.967 o_r=2.326 g=0.801 G=0.000211 (37.018 GiB/s) lsqu(g,G)=0.2169 The actual parameters are reported for each data-size. The latency is half of roundtrip-time of a 1-byte message (and does not depend on the data-size). The send overhead is computed as described in [1] and the receive overhead is simply the time it takes to finish MPI_Recv() (and thus not very accurate). The parameter g and G are computed by the curve fitting. The curve fitting needs at least two points, thus, they can not be computed for the first measurement (nan). However, the more measurement points are considered, the more accurate are the results. The last parameter "lsqu(g,G)" is the least squares deviation of the fit for g,G. The lower this number is, the better the fit and the results. Please refer to [1] for details. Parameter changes are detected by sudden changes in the least squares deviation. Please refter to [1] for details. The benchmark also creates a file "ng.out" which can be plotted for visual analysis of the results. One possible plot in gnuplot would be: plot "ng.out" using 1:($4-$3)/($2-1) . This plots the points that the g,G, line are fitted to. Or plot "ng.out" using 1:7 plots the send overhead for varying data sizes.
Getting the LogGPS parameters Extracting the actual parameters from the output is difficult and requires some level of understanding of the used technique. Please refer to [1] in order to understand the measurement method. A rough guide for each of the parameters is given below: L: simply use the displayed L (round-trip/2). Sometimes it is advisable to substract o_s and/or o_r, however, this can lead to negative latencies (as o_s can happen after the message has been sent). o_s: is defined to be constant (per packet) in the LogP model, however, it is often not constant in practice (per message which might consist of multiple packets). You should use o_s of the desired packet size. o_r: is relatively imprecise and should be used carefully. Please contact the author if you know a precises measurement method for o_r. g: is approximately the point where the fitted curve crosses the y axis (s=0). However, some systems don't have ideal transmission curves. It is advisable to use a g with sufficiently many points to fit and a small lsqu(g,G). G: is the slope of the fitted g,G curve. It is advisable to use a g with sufficiently many points to fit and a small lsqu(g,G). S: is where the library switches from eager to rendezvous. While the library is not obliged to do this at all, it is commonly done on MPI libraries. The benchmark monitors the deviation and tries to detect protocol changes. However, it is safest to investigate the plot manually.

References

PMEO'07	[1] Torsten Hoefler, Andre Lichei and Wolfgang Rehm:
		Low-Overhead LogGP Parameter Assessment for Modern Interconnection Networks TU Chemnitz. In Proceedings of the 21st IEEE International Parallel & Distributed Processing Symposium, PMEO'07 Workshop, presented in Long Beach, CA, USA, IEEE Computer Society, ISBN: 1-4244-0909-8, Mar. 2007,

[2] David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, Thorsten von Eicken
	LogP: towards a realistic model of parallel computation ACM SIGPLAN Notices, Volume 28 , Issue 7 (July 1993), Pages: 1 - 12
[3] Albert Alexandrov, Mihai F. Ionescu, Klaus E. Schauser, Chris Scheiman
	LogGP: Incorporating Long Messages into the LogP Model --- One step closer towards a realistic model for parallel computation Technical Report: TRCS95-09, University of California at Santa Barbara Santa Barbara, CA, USA
[4] Fumihiko Ino, Noriyuki Fujimoto, Kenichi Hagihara
	LogGPS: a parallel computational model for synchronization analysis Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming, Pages: 133 - 142 Year of Publication: 2001 ISBN:1-58113-346-4


serving: 216.73.216.101:9929	© Torsten Hoefler