General Usage
The benchmarks use Netgauge's high-performance timers for different
architectures. Users should make sure that the configure script detected the
timer correctly and that it works reliably (no frequency scaling etc.).
General help: mpirun -n 1 ./netgauge -x ebb --help
Benchmarking the effective bisection bandwidth in 64 InfiniBand nodes (one typically
wants large messages for bandwidth measurements, however, the benchmark also allows small
messages):
$ mpirun -n 64 ./netgauge -s 1048576-1048576 -x ebb -r 10
# Info: (0): Netgauge v2.2 MPI enabled (P=64) (./netgauge -s 1048576-1048576 -x ebb -r 10 )
# initializing x86-64 timer (takes some seconds)
size: 1048576, round 0: num: 64 average: 65525.545105 us (320.051057 MiB/s)
size: 1048576, round 1: num: 64 average: 65419.781957 us (320.568479 MiB/s)
size: 1048576, round 2: num: 64 average: 65292.660184 us (321.192611 MiB/s)
size: 1048576, round 3: num: 64 average: 67542.892781 us (310.491884 MiB/s)
size: 1048576, round 4: num: 64 average: 68092.770270 us (307.984532 MiB/s)
size: 1048576, round 5: num: 64 average: 63865.466484 us (328.370263 MiB/s)
size: 1048576, round 6: num: 64 average: 63695.839034 us (329.244741 MiB/s)
size: 1048576, round 7: num: 64 average: 65396.951463 us (320.680392 MiB/s)
size: 1048576, round 8: num: 64 average: 74078.957820 us (283.096855 MiB/s)
size: 1048576, round 9: num: 64 average: 69648.529044 us (301.104995 MiB/s)
# Info: (0): ---- bucket data ----
size: 1048576 54673.559333 (383.577002 MiB/s): 640
size: 1048576 num: 640 average: 66855.939414 (313.682228 MiB/s)
The last line indicates an average bandwidth of 313 MiB/s which is the
effective bisection bandwidth (of course, a real measurement would require
many more patterns, e.g., 100,000).
This output can be used to compute statistics (e.g., using R), however, netgauge
also supports simple statistics (counting the number of seen bandwidthds
in buckets). This can be done by supplying the --buckets parameter:
$ mpirun -n 64 ./netgauge -s 1048576-1048576 -x ebb -r 1000 -b 50
... (output as before)
size: 1048576, round 999: num: 64 average: 66695.577922 us (314.436439 MB/s)
# Info: (0): ---- bucket data ----
size: 1048576 54543.972000 (384.488317 MiB/s): 41834
size: 1048576 62528.967800 (335.388872 MiB/s): 11706
size: 1048576 70513.963600 (297.409462 MiB/s): 4769
size: 1048576 78498.959400 (267.156662 MiB/s): 1802
size: 1048576 86483.955200 (242.490297 MiB/s): 2570
size: 1048576 94468.951000 (221.993785 MiB/s): 751
size: 1048576 102453.946800 (204.692163 MiB/s): 253
size: 1048576 110438.942600 (189.892437 MiB/s): 255
size: 1048576 118423.938400 (177.088520 MiB/s): 24
size: 1048576 126408.934200 (165.902198 MiB/s): 36
size: 1048576 num: 64000 average: 62274.659839 (336.758483 MiB/s)
We see that 41834 of the 64000 (P*runs) connections had 384 MiB/s while 36
connections were heavily congested and had only 165 MiB/s.
The relative effective bisection bandwidth can be determined by repeating
the experiment with 2 processes. This resulted in 379 MiB/s in our run, thus,
the effective bisection bandwidth on 64 nodes of our system is 336/379=0.89 .
For InfiniBand systems, you can also use our ORCS simulator to simulate the
effective bisection bandwidth.
|