Advanced MPI Programming Tutorial at Supercomputing 2013

Pavan Balaji, Jim Dinan, Rajeev Thakur and I are giving our Advanced MPI Programming tutorial at Supercomputing 2013 on Sunday November 17th.

Are you wondering about the new MPI-3 standard? How it affects you as a scientific or HPC programmer and what nice new features you can use to make your life easier and your application faster? Then you should not miss our tutorial.

Our abstract summarizes the main topics:

The vast majority of production parallel scientific applications today use MPI and run successfully on the largest systems in the world. For example, several MPI applications are running at full scale on the Sequoia system (on ?1.6 million cores) and achieving 12 to 14 petaflops/s of sustained performance. At the same time, the MPI standard itself is evolving (MPI-3 was released late last year) to address the needs and challenges of future extreme-scale platforms as well as applications. This tutorial will cover several advanced features of MPI, including new MPI-3 features, that can help users program modern systems effectively. Using code examples based on scenarios found in real applications, we will cover several topics including efficient ways of doing 2D and 3D stencil computation, derived datatypes, one-sided communication, hybrid (MPI + shared memory) programming, topologies and topology mapping, and neighborhood and nonblocking collectives. Attendees will leave the tutorial with an understanding of how to use these advanced features of MPI and guidelines on how they might perform on different
platforms and architectures.

This tutorial is about advanced use of MPI. It will cover several advanced features that are part of
MPI-1 and MPI-2 (derived datatypes, one-sided communication, thread support, topologies and topology
mapping) as well as new features that were recently added to MPI as part of MPI-3 (substantial additions
to the one-sided communication interface, neighborhood collectives, nonblocking collectives, support for
shared-memory programming).

Implementations of MPI-2 are widely available both from vendors and open-source projects. In addition,
the latest release of the MPICH implementation of MPI supports all of MPI-3. Vendor implementations
derived from MPICH will soon support these new features. As a result, users will be able to use in practice
what they learn in this tutorial.

The tutorial will be example driven, reflecting scenarios found in real applications. We will begin with
a 2D stencil computation with a 1D decomposition to illustrate simple Isend/Irecv based communication.

We will then use a 2D decomposition to illustrate the need for MPI derived datatypes. We will introduce
a simple performance model to demonstrate what performance can be expected and compare it with actual
performance measured on real systems. This model will be used to discuss, evaluate, and motivate the rest
of the tutorial.
We will use the same 2D stencil example to illustrate various ways of doing one-sided communication in
MPI and discuss the pros and cons of the different approaches as well as regular point-to-point communica-
tion. We will then discuss a 3D stencil without getting into complicated code details.
We will use examples of distributed linked lists and distributed locks to illustrate some of the new ad-
vanced one-sided communication features, such as the atomic read-modify-write operations.
We will discuss the support for threads and hybrid programming in MPI and provide two hybrid ver-
sions of the stencil example: MPI+OpenMP and MPI+MPI. The latter uses the new features in MPI-3 for
shared-memory programming. We will also discuss performance and correctness guidelines for hybrid pro-
gramming.

We will introduce process topologies, topology mapping, and the new “neighborhood” collective func-
tions added in MPI-3. These collectives are particularly intended to support stencil computations in a scalable
manner, both in terms of memory consumption and performance.
We will conclude with a discussion of other features in MPI-3 not explicitly covered in this tutorial
(interface for tools, Fortran 2008 bindings, etc.) as well as a summary of recent activities of the MPI Forum
beyond MPI-3.

Our planned agenda for the day is

  1. Introduction (8.30–10.00)
    • Background: What is MPI
    • MPI-1, MPI-2, MPI-3
    • 2D stencil code with 1D decomposition: Isend/Irecv version
    • 2D stencil code with 2D decomposition: Introduce derived datatypes
    • Introduce simple performance modeling and measurement
  2. One-Sided Communication (10.30–12.00)
    • Basics of one-sided communication or remote memory access (RMA)
    • 2D stencil code with 1D decomposition: RMA with 3 forms of synchronization
    • 3D stencil: What changes and what to pay attention to
    • Introduce other features of MPI-3 RMA
    • Linked list or distributed lock example demonstrating new MPI-3 RMA features
  3. Lunch (12.00–1.30)
  4. MPI and Threads (1.30–3.00)
    • What does the MPI standard specify about threads
    • How does it enable hybrid programming
    • Hybrid (MPI+OpenMP) version of 2D stencil code
    • Hybrid (MPI+MPI) version of 2D stencil code using MPI-3 shared-memory support
    • Performance and correctness guidelines for hybrid programming
  5. Topologies, Neighborhood/Nonblocking Collectives (3.30-5.00)
    • Topologies and topology mapping
    • 2D stencil code with 2D decomposition using neighborhood collectives
    • MPI-3 nonblocking collectives with example
    • Summary of other features in MPI-3
    • Summary of recent activities of the MPI Forum
    • Conclusions

We’re looking forward to many interesting discussions!