Life would be so much easier if only we had the source code...
Home -> Research -> NB Collectives -> Performance
NB Collectives
      CG Solver
    MPI Topologies
    MPI Datatypes
    Network Topologies
    Ethernet BTL eth
    Older Projects
  Full CV [pdf]


  Past Events

NBCBench - benchmarking Nonblocking MPI Collective Operations Performance


NBCBench is a benchmark that measures overlap and asynchronous progression of nonblocking collective operations implemented in LibNBC. NBCBench is distributed under the BSD license.

Download NBCBench

Performance Results for different MPI Implementations

We present performance results of LibNBC for different MPI implementations. LibNBC issues MPI_Isend() and MPI_Irecv() calls, and the performance and possible overlap depends on the implementation in MPI. We do also compare the collective operations implemented in LibNBC to the MPI operations. Results are available for the following MPI Implementations: Please keep in mind that not all collective algorithms in LibNBC are optimized!

Benchmark Methodology

We used the overlap-benchmark which has been designed to assess the maximal possible overlap and the minimal latencies. The benchmark will be described later. Details can be found in "Accurately Measuring Collective Operations at Massive Scale" [1] and "Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI" [2].


[1] Torsten Hoefler, Timo Schneider and Andrew Lumsdaine:
 Accurately Measuring Collective Operations at Massive Scale In Proceedings of the 22nd IEEE International Parallel & Distributed Processing Symposium, PMEO'08 Workshop, presented in Miami, FL, ISSN: 1530-2075, ISBN: 978-1-4244-1694-3, Apr. 2008, Invited to a journal special issue on top picks from PMEO'08.
[2] Torsten Hoefler, Andrew Lumsdaine and Wolfgang Rehm:
 Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI In Proceedings of the 2007 International Conference on High Performance Computing, Networking, Storage and Analysis, SC07, presented in Reno, USA, IEEE Computer Society/ACM, Nov. 2007, (acceptance rate 20%, 54/268)

serving:© Torsten Hoefler