Examples


Using MPI, 3rd Edition at The MIT Press
Using Advanced MPI at the MIT Press

Using MPI and Using Advanced MPI

Picture of Book Covers
These two books, published in 2014, show how to use MPI, the Message Passing Interface, to write parallel programs. Using MPI, now in its 3rd edition, provides an introduction to using MPI, including examples of the parallel computing code needed for simulations of partial differential equations and n-body problems. Using Advanced MPI covers additional features of MPI, including parallel I/O, one-sided or remote memory access communcication, and using threads and shared memory from MPI.

What is MPI?

MPI, the Message-Passing Interface, is an application programmer interface (API) for programming parallel computers. It was first released in 1992 and transformed scientific parallel computing. Today, MPI is widely using on everything from laptops (where it makes it easy to develop and debug) to the world's largest and fastest computers. Among the reasons for the the success of MPI is its focus on performance, scalability, and support for building tools and libraries that extend the power of MPI.

Examples

Errata

  • Using MPI (none yet!)
  • Using Advanced MPI (none yet!)

News and Reviews

  • BLOG entry by Torsten Hoefler, one of the authors of Using Advanced MPI.

Tables of Contents

Using MPI 3rd EditionUsing Advanced MPI
Series Forewordxiii
Preface to the Third Editionxv
Preface to the Second Editionxix
Preface to the First Editionxxi
1Background1
1.1Why Parallel Computing?1
1.2Obstacles to Progress2
1.3Why Message Passing?3
1.3.1Parallel Computational Models3
1.3.2Advantages of the Message-Passing Model9
1.4Evolution of Message-Passing Systems10
1.5The MPI Forum11
2Introduction to MPI13
2.1Goal13
2.2What Is MPI?13
2.3Basic MPI Concepts14
2.4Other Interesting Features of MPI18
2.5Is MPI Large or Small?20
2.6Decisions Left to the Implementor21
3Using MPI in Simple Programs23
3.1A First MPI Program23
3.2Running Your First MPI Program28
3.3A First MPI Program in C29
3.4Using MPI from Other Languages29
3.5Timing MPI Programs31
3.6A Self-Scheduling Example: Matrix-Vector Multiplication32
3.7Studying Parallel Performance38
3.7.1Elementary Scalability Calculations39
3.7.2Gathering Data on Program Execution41
3.7.3Instrumenting a Parallel Program with MPE Logging42
3.7.4Events and States43
3.7.5Instrumenting the Matrix-Matrix Multiply Program43
3.7.6Notes on Implementation of Logging47
3.7.7Graphical Display of Logfiles48
3.8Using Communicators49
3.9Another Way of Forming New Communicators55
3.10A Handy Graphics Library for Parallel Programs57
3.11Common Errors and Misunderstandings60
3.12Summary of a Simple Subset of MPI62
3.13Application: Computational Fluid Dynamics62
3.13.1Parallel Formulation63
3.13.2Parallel Implementation65
4Intermediate MPI69
4.1The Poisson Problem70
4.2Topologies73
4.3A Code for the Poisson Problem81
4.4Using Nonblocking Communications91
4.5Synchronous Sends and “Safe” Programs94
4.6More on Scalability95
4.7Jacobi with a 2-D Decomposition98
4.8An MPI Derived Datatype100
4.9Overlapping Communication and Computation101
4.10More on Timing Programs105
4.11Three Dimensions106
4.12Common Errors and Misunderstandings107
4.13Application: Nek5000/NekCEM108
5Fun with Datatypes113
5.1MPI Datatypes113
5.1.1Basic Datatypes and Concepts113
5.1.2Derived Datatypes116
5.1.3Understanding Extents118
5.2The N-Body Problem119
5.2.1Gather120
5.2.2Nonblocking Pipeline124
5.2.3Moving Particles between Processes127
5.2.4Sending Dynamically Allocated Data132
5.2.5User-Controlled Data Packing134
5.3Visualizing the Mandelbrot Set136
5.4Gaps in Datatypes146
5.5More on Datatypes for Structures148
5.6Deprecated and Removed Functions149
5.7Common Errors and Misunderstandings150
5.8Application: Cosmological Large-Scale Structure Formation152
5.3.1Sending Arrays of Structures144
6Parallel Libraries155
6.1Motivation155
6.1.1The Need for Parallel Libraries155
6.1.2Common Deficiencies of Early Message-Passing Systems156
6.1.3Review of MPI Features That Support Libraries158
6.2A First MPI Library161
6.3Linear Algebra on Grids170
6.3.1Mappings and Logical Grids170
6.3.2Vectors and Matrices175
6.3.3Components of a Parallel Library177
6.4The LINPACK Benchmark in MPI179
6.5Strategies for Library Building183
6.6Examples of Libraries184
6.7Application: Nuclear Green’s Function Monte Carlo185
7Other Features of MPI189
7.1Working with Global Data189
7.1.1Shared Memory, Global Data, and Distributed Memory189
7.1.2A Counter Example190
7.1.3The Shared Counter Using Polling Instead of an Extra Process193
7.1.4Fairness in Message Passing196
7.1.5Exploiting Request-Response Message Patterns198
7.2Advanced Collective Operations201
7.2.1Data Movement201
7.2.2Collective Computation201
7.2.3Common Errors and Misunderstandings206
7.3Intercommunicators208
7.4Heterogeneous Computing216
7.5Hybrid Programming with MPI and OpenMP217
7.6The MPI Profiling Interface218
7.6.1Finding Buffering Problems221
7.6.2Finding Load Imbalances223
7.6.3Mechanics of Using the Profiling Interface223
7.7Error Handling226
7.7.1Error Handlers226
7.7.2Example of Error Handling229
7.7.3User-Defined Error Handlers229
7.7.4Terminating MPI Programs232
7.7.5Common Errors and Misunderstandings232
7.8The MPI Environment234
7.8.1Processor Name236
7.8.2Is MPI Initialized?236
7.9Determining the Version of MPI237
7.10Other Functions in MPI239
7.11Application: No-Core Configuration Interaction Calculations in Nuclear Physics240
8Understanding How MPI Implementations Work245
8.1Introduction245
8.1.1Sending Data245
8.1.2Receiving Data246
8.1.3Rendezvous Protocol246
8.1.4Matching Protocols to MPI’s Send Modes247
8.1.5Performance Implications248
8.1.6Alternative MPI Implementation Strategies249
8.1.7Tuning MPI Implementations249
8.2How Difficult Is MPI to Implement?249
8.3Device Capabilities and the MPI Library Definition250
8.4Reliability of Data Transfer251
9Comparing MPI with Sockets253
9.1Process Startup and Shutdown255
9.2Handling Faults257
10Wait! There’s More!259
10.1Beyond MPI-1259
10.2Using Advanced MPI260
10.3Will There Be an MPI-4?261
10.4Beyond Message Passing Altogether261
10.5Final Words262
Glossary of Selected Terms263
A The MPE Multiprocessing Environment273
A.1 MPE Logging273
A.2 MPE Graphics275
A.3 MPE Helpers276
B MPI Resources Online279
C Language Details281
C.1 Arrays in C and Fortran281
C.1.1 Column and Row Major Ordering281
C.1.2 Meshes vs. Matrices281
C.1.3 Higher Dimensional Arrays282
C.2 Aliasing285
References287
Subject Index301
Function and Term Index305
 
Series Forewordxv
Forewordxvii
Prefacexix
1Introduction1
1.1MPI-1 and MPI-21
1.2MPI-32
1.3Parallelism and MPI3
1.3.1Conway’s Game of Life4
1.3.2Poisson Solver5
1.4Passing Hints to the MPI Implementation with MPI_Info11
1.4.1Motivation, Description, and Rationale12
1.4.2An Example from Parallel I/O12
1.5Organization of This Book13
2Working with Large-Scale Systems15
2.1Nonblocking Collectives16
2.1.1Example: 2-D FFT16
2.1.2Example: Five-Point Stencil19
2.1.3Matching, Completion, and Progression20
2.1.4Restrictions22
2.1.5Collective Software Pipelining23
2.1.6A Nonblocking Barrier?27
2.1.7Nonblocking Allreduce and Krylov Methods30
2.2Distributed Graph Topologies31
2.2.1Example: The Peterson Graph37
2.2.2Edge Weights37
2.2.3Graph Topology Info Argument39
2.2.4Process Reordering39
2.3Collective Operations on Process Topologies40
2.3.1Neighborhood Collectives41
2.3.2Vector Neighborhood Collectives44
2.3.3Nonblocking Neighborhood Collectives45
2.4Advanced Communicator Creation48
2.4.1Nonblocking Communicator Duplication48
2.4.2Noncollective Communicator Creation50
3Introduction to Remote Memory Operations55
3.1Introduction57
3.2Contrast with Message Passing59
3.3Memory Windows62
3.3.1Hints on Choosing Window Parameters64
3.3.2Relationship to Other Approaches65
3.4Moving Data65
3.4.1Reasons for Using Displacement Units69
3.4.2Cautions in Using Displacement Units70
3.4.3Displacement Sizes in Fortran71
3.5Completing RMA Data Transfers71
3.6Examples of RMA Operations73
3.6.1Mesh Ghost Cell Communication74
3.6.2Combining Communication and Computation84
3.7Pitfalls in Accessing Memory88
3.7.1Atomicity of Memory Operations89
3.7.2Memory Coherency90
3.7.3Some Simple Rules for RMA91
3.7.4Overlapping Windows93
3.7.5Compiler Optimizations93
3.8Performance Tuning for RMA Operations95
3.8.1Options for MPI_Win_create95
3.8.2Options for MPI_Win_fence97
4Advanced Remote Memory Access101
4.1Passive Target Synchronization101
4.2Implementing Blocking, Independent RMA Operations102
4.3Allocating Memory for MPI Windows104
4.3.1Using MPI_Alloc_mem and MPI_Win_allocate from C104
4.3.2Using MPI_Alloc_mem and MPI_Win_allocate from Fortran 2008105
4.3.3Using MPI_ALLOC_MEM and MPI_WIN_ALLOCATE from Older Fortran107
4.4Another Version of NXTVAL108
4.4.1The Nonblocking Lock110
4.4.2NXTVAL with MPI_Fetch_and_op110
4.4.3Window Attributes112
4.5An RMA Mutex115
4.6Global Arrays120
4.6.1Create and Free122
4.6.2Put and Get124
4.6.3Accumulate127
4.6.4The Rest of Global Arrays128
4.7A Better Mutex130
4.8Managing a Distributed Data Structure131
4.8.1A Shared-Memory Distributed List Implementation132
4.8.2An MPI Implementation of a Distributed List135
4.8.3Inserting into a Distributed List140
4.8.4An MPI Implementation of a Dynamic Distributed List143
4.8.5Comments on More Concurrent List Implementations145
4.9Compiler Optimization and Passive Targets148
4.10MPI RMA Memory Models149
4.11Scalable Synchronization152
4.11.1Exposure and Access Epochs152
4.11.2The Ghost-Point Exchange Revisited153
4.11.3Performance Optimizations for Scalable Synchronization155
4.12Summary156
5Using Shared Memory with MPI157
5.1Using MPI Shared Memory159
5.1.1Shared On-Node Data Structures159
5.1.2Communication through Shared Memory160
5.1.3Reducing the Number of Subdomains163
5.2Allocating Shared Memory163
5.3Address Calculation165
6Hybrid Programming169
6.1Background169
6.2Thread Basics and Issues170
6.2.1Thread Safety171
6.2.2Performance Issues with Threads172
6.2.3Threads and Processes173
6.3MPI and Threads173
6.4Yet Another Version of NXTVAL176
6.5Nonblocking Version of MPI_Comm_accept178
6.6Hybrid Programming with MPI179
6.7MPI Message and Thread-Safe Probe182
7Parallel I/O187
7.1Introduction187
7.2Using MPI for Simple I/O187
7.2.1Using Individual File Pointers187
7.2.2Using Explicit Offsets191
7.2.3Writing to a File194
7.3Noncontiguous Accesses and Collective I/O195
7.3.1Noncontiguous Accesses195
7.3.2Collective I/O199
7.4Accessing Arrays Stored in Files203
7.4.1Distributed Arrays204
7.4.2A Word of Warning about Darray206
7.4.3Subarray Datatype Constructor207
7.4.4Local Array with Ghost Area210
7.4.5Irregularly Distributed Arrays211
7.5Nonblocking I/O and Split Collective I/O215
7.6Shared File Pointers216
7.7Passing Hints to the Implementation219
7.8Consistency Semantics221
7.8.1Simple Cases224
7.8.2Accessing a Common File Opened with MPI_COMM_WORLD224
7.8.3Accessing a Common File Opened with MPI_COMM_SELF227
7.8.4General Recommendation228
7.9File Interoperability229
7.9.1File Structure229
7.9.2File Data Representation230
7.9.3Use of Datatypes for Portability231
7.9.4User-Defined Data Representations233
7.10Achieving High I/O Performance with MPI234
7.10.1The Four “Levels” of Access234
7.10.2Performance Results237
7.11An Example Application238
7.12Summary242
8Coping with Large Data243
8.1MPI Support for Large Data243
8.2Using Derived Datatypes243
8.3Example244
8.4Limitations of This Approach245
8.4.1Collective Reduction Functions245
8.4.2Irregular Collectives246
9Support for Performance and Correctness Debugging249
9.1The Tools Interface250
9.1.1Control Variables251
9.1.2Performance Variables257
9.2Info, Assertions, and MPI Objects263
9.3Debugging and the MPIR Debugger Interface267
9.4Summary269
10Dynamic Process Management271
10.1Intercommunicators271
10.2Creating New MPI Processes271
10.2.1Parallel cp: A Simple System Utility272
10.2.2Matrix-Vector Multiplication Example279
10.2.3Intercommunicator Collective Operations284
10.2.4Intercommunicator Point-to-Point Communication285
10.2.5Finding the Number of Available Processes285
10.2.6Passing Command-Line Arguments to Spawned Programs290
10.3Connecting MPI Processes291
10.3.1Visualizing the Computation in an MPI Program292
10.3.2Accepting Connections from Other Programs294
10.3.3Comparison with Sockets296
10.3.4Moving Data between Groups of Processes298
10.3.5Name Publishing299
10.4Design of the MPI Dynamic Process Routines302
10.4.1Goals for MPI Dynamic Process Management302
10.4.2What MPI Did Not Standardize303
11Working with Modern Fortran305
11.1The mpi_f08 Module305
11.2Problems with the Fortran Interface306
11.2.1Choice Parameters in Fortran307
11.2.2Nonblocking Routines in Fortran308
11.2.3Array Sections310
11.2.4Trouble with LOGICAL311
12Features for Libraries313
12.1External Interface Functions313
12.1.1Decoding Datatypes313
12.1.2Generalized Requests315
12.1.3Adding New Error Codes and Classes322
12.2Mixed-Language Programming324
12.3Attribute Caching327
12.4Using Reduction Operations Locally331
12.5Error Handling333
12.5.1Error Handlers333
12.5.2Error Codes and Classes335
12.6Topics Not Covered in This Book335
13Conclusions341
13.1MPI Implementation Status341
13.2Future Versions of the MPI Standard341
13.3MPI at Exascale342
MPI Resources on the World Wide Web343
References345
Subject Index353
Function and Term Index359