MPI | Torsten Hoefler's blog

Mar102012

MPI-3.0 chugging along

Here are some updates from the March MPI Forum. We decided that the door has closed for new proposals, so MPI-3.0 could be ratified in the December meeting if everything goes well!

Otherwise, we made huge progress on many small things. Many readings and votes on minor tickets and the results can be found here. The most interesting proposals for me were #284 (Allocate a shared memory window), #286 (Noncollective Communicator Creation), and #168 (Nonblocking Communicator Duplication), which all passed their first vote. The Fortran bindings ticket #229 passed it’s second vote! Scalable vector collectives (#264) were postponed to the next MPI version because the Forum felt that they would need more investigation of several alternative options.

I explained those and other interesting tickets in my last post on MPI-3.0.

We also made substantial progress on Fault Tolerance (which remains a controversial topic for several reasons) and a lot of cleanup (thanks Rolf!). The next meeting in Japan will be exciting!

Feb62012

MPI-3.0 is Coming—an Overview of new (and old) Features

htorHPC • MPI • Science

UPDATE: The new MPI-3 book appeared. This book describes all information on this page in a well-written form (including other advanced MPI features) and with examples for practitioners. More information. Direct link to Amazon.

I am involved in the MPI Forum which is developing and ratifying the Message Passing Interface standards. Actually, I managed to attend every single MPI Forum meeting (27 so far) since the Forum reconvened in Jan. 2008 and I also co-authored MPI-2.1 and MPI-2.2.

The MPI Forum strives to release MPI-3.0 asap (which may mean in a year or so ;-)), so most, if not all significant proposals are in a feature-freeze and polishing stage. I’ll try to summarize the hot topics in MPI-3.0 (in no particular order) here. The Forum is public (join us!) and all meetings and activities are documented at http://meetings.mpi-forum.org/. However, the wiki and meeting structure is hard to follow for people who do not regularly attend the meetings (actually, it’s even hard for people who do so).

The MPI-3.0 ticket process is relatively simple: a ticket is put together by a subgroup or an individual and discussed in the chapter working group. Then it is brought forward for discussion to the full Forum, formally read in a plenary session and voted twice. The reading and the votes happen at different meetings, i.e., a ticket needs at least six months to be ratified (this gives the Forum time to check it for correctness). Non-trivial changes are also not possible after a reading. Both votes have to be passed for the ticket to be ratified. Then, it is integrated into the draft standard by the chapter author(s). Finally, at the end of the process, each chapter is voted by the Forum, and after that (mostly a formality) there will be a vote for the whole standard. Votes are by organization and an organization has to participate regularly in the Forum to be eligible to vote (has to be presented on two of the three meetings before the vote). Input from the public is generally valued and should be communicated through the mailinglist or Forum members.

Keep in mind that this list and the comments are representing my personal view and only the final standard is the last word! You can find the original tickets by appending the ticket ID to https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/ , e.g., https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/109 for nonblocking collective operations.

1) Nonblocking Collective Operations #109

Status: passed

Nonblocking collectives #109 was the first proposal voted into MPI-3.0 more than a year ago (Oct. 2010). The proposal dates back to one of the first meetings in 2008 (I wanted it in MPI-2.2 but we decided to save this “major change” for MPI-3 to make the adoption of MPI-2.2 faster). Since this proposal came first, it was used to define much of the process for MPI-3.0 and it was also probably scrutinized most :-). Actually, it seems rather simple but there were some subtle corner-cases that needed to be defined. But after all, it allows one to issue “immediate” (that’s where the “I” comes from) collective operations, such as:
MPI_Ibcast(buf, count, type, root, comm, &request); ... // compute MPI_Wait(&request, &status);

This can be used to overlap computation and communication and enables several use-cases such as software pipelining (cf. Hoefler, Gottschling, Lumsdaine: “Leveraging Non-blocking Collective Communication in High-performance Applications”) or also interesting parallel protocols that require the nonblocking semantics (cf. Hoefler, Siebert, Lumsdaine: “Scalable Communication Protocols for Dynamic Sparse Data Exchange”).

A reference implementation is available with LibNBC and I’m looking forward to optimized platform-specific versions with full asynchronous progression! The latest (svn) version of MPICH2 is already supporting them (while some other MPI implementations are still working on MPI-2.2 compliance).

2) Neighborhood Collectives #258

Status: passed

Neighborhood (formerly aka. sparse) collective operations are extending the distributed graph and Cartesian process topologies with additional communication power. A user can now statically define a communication topology and also perform communication functions between neighbors in this topology! For example:

// create a 3d topology MPI_Cart_create(comm, 3, {2,2,2}, {1,1,1}, 1, &newcomm); ... // read input data according to process order in newcomm while(!converged) { // start neighbor communication MPI_Ineighbor_alltoall(..., &newcomm, &req); ... // compute inner parts MPI_Wait(&req, MPI_STATUS_IGNORE); ... // compute outer parts }

This obviously simplifies the MPI code quite a bit (compared to the old “north, south, west, east” exchanges with extra pack/send/recv/unpack code for each direction) and often improves performance. This can also be nicely combined with MPI datatypes (neighbor_alltoallw) to offer a very high abstraction level. Distributed graph communicators enable the specification of completely arbitrary communication relations. A more complex (application) example is described in Hoefler, Lorenzen, Lumsdaine: “Sparse Non-Blocking Collectives in Quantum Mechanical Calculations”.

The MPI implementation can optimize the topology and the message schedule for those functions in the graph or Cartesian communicator creation call. Optimization opportunities and a neighborhood_reduce call (which the Forum decided to remove from the proposal) are discussed in Hoefler, Traeff: “Sparse Collective Operations for MPI”.

3) Matched probe #38

Status: passed

One of the oldest tickets that we (Doug Gregor, who originally identified the problem when providing C# bindings for MPI, and I) proposed to MPI-2.2. It was deferred to MPI-3.0 for various reasons. This ticket fixes an old bug in MPI-2 where one could not probe for messages in a multi-threaded environment. The issue is somewhat subtle and complex to explain. For a good examples and a description of the complexity of the problem and the performance of the solution, refer to Hoefler, Bronevetsky, Barrett, de Supinski, Lumsdaine: “Efficient MPI Support for Advanced Hybrid Programming Models”.

The new interface works by removing the message at probe time from the matching queue and allowing the receiver to match it later with a special call:

MPI_Mprobe(source, tag, comm, &message, &status); ... // prepare buffer etc. MPI_Mrecv(buf, count, type, &message, &status);

This avoids “bad” thread interleavings that lead to erroneous receives. Jeff has a good description of the problem in his blog.

4) MPIT Tool Interface #266

Status: passed

The new MPI tool interface allows the MPI implementation to expose certain internal variables, counters, and other states to the user (most likely performance tools). The huge difference to the various predecessor proposals is that it does not impose any specific structure or implementation choice (such as having an eager protocol) on the MPI implementations. Another side-effect of this is that is doesn’t really have to offer anything to the user :-). However, a “high quality” MPI implementation may use this interface to expose relevant state.

It will certainly be very useful for tools and advanced MPI users to investigate performance issues.

5) C Const Correctness #140

Status: passed

This sounds rather small but came with a major pain to pass it (anybody remembers why?). This ticket basically makes the C interface const-correct, i.e., adds the const qualifier to all C interface functions. All C++ functions already have const qualifiers.

This turns
int MPI_Gather(void* , int, MPI_Datatype, void*, int, MPI_Datatype, int, MPI_Comm);
into
int MPI_Gather(const void* , int, MPI_Datatype, void*, int, MPI_Datatype, int, MPI_Comm);
and thus allows several compiler optimizations and prevents some user errors (produces compiler warnings at least).

6) Updated One Sided Chapter #270

Status: passed

This proposal killed probably half of my free-time in the last years. It started at the Portland Forum meeting in 2009 where another group was proposing to essentially rewrite the MPI-2 One Sided chapter from scratch. I disagreed vehemently because the proposed text would not allow an implementation on systems that were not cache coherent. MPI-2 handled the cache coherency issue very elegantly but was in many places hard to use and even harder to understand.

After a night of re-writing the existing chapter to differentiate between two memory models (essentially “cache-coherent” and “not cache-coherent” in MPI lingo “unified” and “separate” public and private window) a new proposal was born. A subgroup started to bang on it (and erased essentially 80% of the ideas, replacing them with better ones!) and two years later we had what is probably the hairiest part of MPI (memory models are extremely complex). The RMA working group was a lot of fun, many inspiring discussions lead us to a good solution! This chapter was a good example of an excellent group effort!

The new chapter offers:

two memory models: one supporting cache-coherent systems (similar to many PGAS languages) and the other one is essentially the “old” MPI-2 model
different ordering modes for accumulate accesses (warning: the safe default mode may be easy to reason about but slower)
MPI_Win_allocate, a collective window creation function that allocates (potentially symmetric or specialized) memory for faster One Sided access
MPI_Win_create_dynamic, a mechanism to create a window that spans the whole address space together with functions to register (MPI_Win_attach) and deregister (MPI_Win_detach) memory locally
MPI_Get_accumulate, a fetch-and-accumulate function to atomically fetch and apply an operation to a variable
MPI_Fetch_and_op, a more specialized version of MPI_Get_accumulate with less parameters for atomic access to scalars only
MPI_Compare_and_swap, a CAS function as we know it from shared memory multiprogramming
MPI_R{put,get,accumulate,get_accumulate}, request-based MPI functions for local completion checking without window synchronization functions
MPI_Win_{un}lock_all, a function to (un)lock all processes in a window from a single process (not collective!)
MPI_Win_flush{_all}, a way to complete all outstanding operations to a specific target process (or all processes). Upon return of this function, the operation completed at the target (either in the private or public window copy)
MPI_Win_flush_local{_all}, a function to complete all operations locally to a specified process (or all processes). This does not include remote completion but local buffers can be re-used
conflicting accesses are now allowed but the outcome is undefined (and may corrupt the window). This is similar to the C++ memory model

Of course, nobody can understand the power of the new One Sided interface based on this small list without examples. The One Sided working group is working on more documentation and similar posts, I plan to link or mirror them here!

7) Allocating a Shared Memory Window #284

Status: read

Several groups wanted the ability to create shared memory in MPI. This would allow to share data-structures across all MPI processes in a multicore node similarly to OpenMP. However, unlike OpenMP, one would just share single objects (arrays etc.) and not the whole address space. The idea here is to combine this with One Sided and allow to create a window which is accessible with load/store (ISA) instructions to all participating processes.

This extends the already complex One Sided chapter (semantics) with the concept of local and remote memory. The proposal is still under discussion and may change. Currently, one can create such a window with MPI_Win_allocate_shared(size, info, comm, baseptr, win) and then use One Sided synchronization (flush and friends) to access it.

By default, the allocated memory is contiguous across process boundaries (process x’s memory starts after process x-1’s memory ends). The info argument alloc_shared_noncontig can be used to relax this and allow the implementation to allocate memory close to a process (on NUMA systems). Then, the user has to use the function MPI_Win_shared_query() to determine the base address of remote processes’ memory segment.

MPI-3.0 will also offer a special communicator split function that can be used to create a set of communicators which only include processes that can create a shared memory window (i.e., mutually share memory).

8 ) Noncollective Communicator Creation #286

Status: read

A very interesting proposal to allow a group of processes to create a communicator “on their own”, i.e., without involving the full parent communicator. This would ve very useful for MPI fault tolerance, where it could be used to “fix” a broken communicator (create a communicator with less processes). Compare this to Gropp, Lusk: “Fault Tolerance in MPI Programs”. This could be achieved with current functions but would be slow and cumbersome, see Dinan et al.: “Noncollective Communicator Creation in MPI“.

9) Nonblocking MPI_Comm_dup #168

Status: read

This very simple proposal allows to duplicate communicators in a nonblocking way. This allows to overlap the communicator creation latency and also implement “purely” nonblocking functions without initialization calls (cf. Hoefler, Snir: “Writing Parallel Libraries with MPI – Common Practice, Issues, and Extensions”). There is not much more to say about this simple call :-).

10) Fortran Bindings #229 (+24 more!)

Status: voted

This is a supposedly simple ticket that was developed to improve the Fortran bindings and add Fortran 2008 bindings. It offers type-safety and tries to resolve the issue with Fortran code movement (relying on Fortran TR 29113). I am not a Fortran expert (preferring C++ for scientific computing instead) so I can’t really speak to it. Jeff has a good post on this.

That’s it! Well, I am 100% sure that I forgot several proposals (some may even be still in the pipeline or below the radar) and I’ll add them here as they show up. We also already postponed several features to MPI-3.1, so the MPI train continues to run.

The Forum is also actively seeking feedback from the community. If you are interested in any of the described features, please give the draft standard a read and let us know if you have concerns (or praise :-))!

Torsten Hoefler

Jul292011

Hot Interconnects 2011 teaser and highlights!

htorHPC • MPI • Science

Rich Brueckner from insideHPC interviews the Co-Chairs of HotI’11 (Fabrizio Petrini, Patrick Geoffray, and I) in a podcast. Citing one of the chairs “Listen to it, it’s fun! Three horrible accents but a great program!” :-).

http://insidehpc.com/2011/07/28/podcast-hot-interconnects-conference-looks-to-the-future-of-networks/

Apr242010

Welcome to Blue Drop!

htorHPC • MPI • Science

We got our POWER7 780 (MR) System on Friday and I just logged in :-). I’m alone on something like this:

htor@bd01:~> w
 20:49:12 up 1 day,  1:01,  1 user,  load average: 0.00, 0.00, 0.00
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT
htor     pts/0     20:35    0.00s  0.04s  0.01s w
htor@bd01:~> cat /proc/cpuinfo 
processor       : 0
cpu             : POWER7 (architected), altivec supported
clock           : 3864.000000MHz
revision        : 2.1 (pvr 003f 0201)

processor       : 1
cpu             : POWER7 (architected), altivec supported
clock           : 3864.000000MHz
revision        : 2.1 (pvr 003f 0201)
........
processor       : 127
cpu             : POWER7 (architected), altivec supported
clock           : 3864.000000MHz
revision        : 2.1 (pvr 003f 0201)

timebase        : 512000000
platform        : pSeries
model           : IBM,9179-MHB
machine         : CHRP IBM,9179-MHB

It’s SMT=4 though (so “only” 32 cores) but with sweet 128 GiB memory. With 4 FMAs per cycle, that’s 983.04 GF/s, nearly one TF in a single “rack” (that thing below)!

I particularly like the name … we all hope that the drop will quickly fill up a bucket full of water(s) :-).

89928771
(credits: Steve Kleinvehn, original)

Nov222009

SC09 in Portland

htorHPC • MPI

Yes, still stuck in Rainland. This year’s SC was the best I ever attended to be honest. It felt like I know everybody and I was invited to two to three parties every evening (awesome). My work was presented at the MPI Forum BoF and the FASTOS BoF. I was also invited to present parts of my MPI Forum work at the MPICH2 BoF which was great! However, I fell sick on Wednesday and felt really bad on Thursday and much worse on Friday (which made me miss the morning panel and the SC10 committee meeting).

And the two most exciting happenings:

I entered two drawings and won twice! (an X-box Elite from Microsoft *YAY* (thanks Fab!) and some USB stick from teragrid (I hope they’ll send it to me))
I was complaining about the low threading support (only four) in the Power 7 and the random guy next to me started to explain why. It turned out that this guy was Burton Smith! He entered my personal hall of fame after his keynote at SPAA 2008. This man knows exactly what he talks about and we chatted a long time (until the show closed) about network topologies and routing. It was surprising to me that he mentioned the same fundamental insight in topologies that I had about a month ago independently. He also studied Cayley graphs and friends … five years ago (D’oh, I’m too young!).

Nov152009

November MPI Forum in Portland

htorHPC • MPI • Science

They should have called the place “Rainland” but ok, I brought an umbrella 🙂 .

This week’s MPI Forum was very interesting! Marc Snir presented his convincing hybrid proposal. It’s really nice and orthogonal to the current standard. It needs some minor polishing and an implementation and seems ready to go.

We had some incremental discussions in the collectives working group but nothing very exciting. I think it is time to look for applications/systems that can benefit from the sparse collective proposal. Sameer Kumar sent me a very interesting paper which seems to be what we need! We also assimilated the Topology chapter into the collectives working group (now called collectives and topology working group — short colltop 🙂 ).

The RMA discussions were helpful this time and motivated me to summarize all ideas that floated in my head into a patch to the MPI-2.2 standard document during the weekend. I’ll post it to the RMA list and will see what happens. I think the RMA interface in MPI-2.0 is rather elegant and only needs some minor tweaks and some semantic changes to make it useful.

The MPI-3 discussions were going in circles (again). We went back and forth if we should call our next release (which contains nonblocking collectives and probably support for hybrid environments) MPI 2.3 or MPI 3.0 draft. We didn’t come to any conclusion. The only decision we made (I think, we didn’t vote though) is that we don’t want to break source compatibility in the next revision yet. I’d like to call it 2.3 then because having a 3.0 draft means that 3.0 will be a similar release and we would probably break compatibility in 3.1 which doesn’t seem to useful. 2.3 also gives the user a better impression that it’s not a revolutionary new thing (e.g., fault tolerant). However, I don’t have a too strng opinion, I just have some users who want nonblocking collectives in a release that is at least source compatible.

Another really annoying thing is the whole MPI_Count story. I have to admit that I was in favor of it at the beginning because abstraction seems right to me, however, I am now really against it due to several reasons: (1) the workaround is trivial and causes negligible overhead, (2) it breaks source compatibility which is a total no-go, and (3) it causes all kinds of Fortran 77 problems (it seems that this is the reason why int was selected in the first place). Could we just withdraw the ticket please?

Sep272009

ICPP 2009 in Vienna

htorHPC • MPI • Science • Travel

I presented our initial work on Offloading Collective Operations, which is the definition of an Assembly language for group operations (GOAL), at ICPP’09 in Vienna. I was rather disappointed by this year’s ICPP. We had some problems with the program selection already before the conference (I’ll happily tell you details on request) and the final program was not great. Some talks were very entertaining though. I really enjoyed the P2S2 workshop, especially Pete Beckman’s keynote. Other highlights (in my opinion) include:

Mondrian’s “A resource optimized remote-memory-access architecture for low-latency communication” (I need to talk to those guys (I did ;))
Argonne’s “Improving Resource Availability By Relaxing Network Allocation Constraints on the Blue Gene/P” (I need to read the paper because I missed the talk due to chaotic re-scheduling, but Narayan’s 5-minute elevator pitch summary seemed very interesting)
Prof. Resch’s keynote on “Simulation Performance through Parallelism -Challenges and Options” (he even mentioned the German Pirate party which I really enjoyed!)
Brice work with Argonne on “Cache-Efficient, Intranode Large-Message MPI Communication with MPICH2-Nemesis”
Argonne’s “End-to-End Study of Parallel Volume Rendering on the IBM Blue Gene/P” (yes, another excellent Argonne talk right before my presentation :))

Here are some nice pictures:
vienna1
My talk at the last day was a real success (very well attended, even though it was the last talk in the conference)! It’s good to have friends (and a good talk from Argonne right before mine :-)). Btw. two of the three talks in the (only) “Information Retrieval” session were completely misplaced and had nothing to do with it, weird …

vienna2
My co-author, friendly driver, and camera-man and me in front of the parliament.

Sep112009

EuroPVM/MPI 2009 report

htorHPC • MPI • Science • Travel

This year’s EuroPVM/MPI was held in Helsinki (not quite, but close to it). I stayed in Hanasaari, a beautiful island with a small hotel and conference center on it. It’s a bit remote but nicely surrounded by nature.

The conference was nice, I learned about formal verification of MPI programs in the first day’s tutorial. This technique seems really nice for non-deterministic MPI programs (how many are there?) but there are certainly some open problems (similar to the state explosion of thread-checkers). The remainder of the conference was very nice and it feels good to meet the usual MPI suspects again. Some highlights were in my opinion:

Edgar’s “VolpexMPI: an MPI Library for Execution of Parallel Applications on Volatile Nodes” (indeterminism is an interesting discussion in this context)
Rusty’s keynote on “Using MPI to Implement Scalable Libraries” (which I suspect could use collectives)
Argonne’s “Processing MPI Datatypes Outside MPI” (could be very very useful for LibNBC)
and Steven’s invited talk on “Formal Verification for Scientific Computing: Trends and Progress” (an excellent overview for our comunity)

The whole crowd:

Unfortunately, I had to leave before the MPI Forum information session to catch my flight.
Videos of many talks are available at Videos. All-in-all, it was worth to attend. Next year’s EuroMPI (yes, the conference was finally renamed after the second year in a row without a PVM paper) will be in Stuttgart. So stay tuned and submit papers!

Sep42009

MPI 2.2 is now officially ratified by the MPI Forum!

htorHPC • MPI

I just came back from lunch after the MPI Forum meeting in Helsinki. This meeting focused again (the last time) on MPI 2.2. We finished the review of the final document and edited several minor things. Bill did a great job in chairing and pushing the MPI 2.2 work and the overall editing. Unfortunately, we did not meet our own deadlines, i.e., the chapters and reviews were not finished two weeks ago (I tried to push my chapters (5 and 7) as hard as possible, but getting the necessary reviews was certainly not easy). However, the whole document was reviewed (read) by forum members during the meeting and my confidence is high that everybody did a good job.

Here are the results of the official vote on the main document:
yes: 14
no: 1
abstain: 2 (did not participate)

The votes by chapter will be online soon.

The feature-set of the standard did not change. I posted it earlier here and Jeff also. But it’s official now! Implementors should now get everything implemented so that all users can enjoy the new features.

Here is a local copy (mirror) of the official document: mpi-report-2_2.pdf (the creation date might change)

One downside is that we already have errata items for things that were discovered too late in the process. This seems odd, however, we decided that we should not break our own rules. And even if the standard says that an MPI_BOOL is 4 bytes, we had to close the door for changes at some point. The errata (MPI_BOOL is one byte) will be voted on and posted soon on the main webpage.

Rolf will publish the MPI 2.2 book (like he did for MPI 2.1) and it will be available at Supercomputing 2009. I already ordered my copy :).

And now we’re moving on to MPI 3, so stay tuned (or participate in the forum)!

mpi-report-2.2-2009-09-04-as-1book

Jul302009

The MPI Standard MPI-2.2 is fixed now

htorHPC • MPI

We just finished all voting on the last MPI-2.2 tickets! This means that MPI-2.2 is fixed now, no changes are possible. The remaining work is simply to merge the accepted tickets into the final draft that will be voted on next time in Helsinki. I just finished editing my parts of the standard draft. Everything (substantial) that I proposed made it in with a reasonable quorum. The new graph topology interface was nicely accepted this time (I think I explained it better and I presented an optimized implementation). However, other tickets didn’t go that smooth. The process seems very interesting from a social perspective (the way we vote has a substantial impact on the results etc.).

Some tickets that I think are worth discussing are:

Add a local Reduction Function – this enables the user to use MPI reduction operations locally (without communication). This is very useful for library implementors (e.g., implementing new collective routines on top of MPI) – PASSED!

Regular (non-vector) version of MPI_Reduce_scatter – this addresses a kind of missing functionality. The current Reduce_scatter should be Reduce_scatterv … but it isn’t. Anyway, if you ever asked yourself why the heck should I use Reduce_scatter then think about parallel matrix multiplication! An example is attached to the ticket. – PASSED!

Add MPI_IN_PLACE option to Alltoall – nobody knows why this is not in MPI-2. I suppose that it seemed complicated to implement (an optimized implementation is indeed NP hard), but we have a simple (non-optimal, linear time) algorithm to do it. It’s attached to the ticket :). – PASSED!

Fix Scalability Issues in Graph Topology Interface – this is in my opinion the most interesting/important addition in MPI-2.2. The graph topology interface in MPI-2.1 is horribly broken in that every process needs to provide the *full* graph to the library (which even in sparse graphs leads to $\Omega(P)$ memory *per node*). I think we have an elegant fix that enables fully distributed specification of the graph as well as each node specifies its neighbors. This will be even more interesting in MPI-3, when we start to use the topology as communication context. – PASSED!

Extending MPI_COMM_CREATE to create several disjoint sub-communicators from an intracommunicator -Neat feature that allows you to create multiple communicators with a single call! – PASSED!

Add MPI_IN_PLACE option to Exscan – again, don’t know why this is missing in MPI-2.0. The rationale that is given is not convincing. PASSED!

Define a new MPI_Count Datatype – MPI-2.1 can’t send more than 2^31 (=2 Mio) objects on 32-bit systems right now – we should fix that! However, we had to move this to MPI-3 due to several issues that came up during the implementation (most likely ABI issues) POSTPONED! It feels really good to have this strict implementation requirement! We will certainly have this important fix in MPI-3!

Add const Keyword to the C bindings – most discussed feature I guess 🙂 – I am not sure about the consequences yet, but it seems nice to me (so far). – POSTPONED! We moved this to MPI-3 because some part of the Forum wasn’t sure about the consequences. I am personally also going back and forth, the issue with strided datatypes seems really worrysome.

Allow concurrent access to send buffer – most programmers probably did not know that this is illegal, but it certainly is in MPI<=2.0. For example: int sendbuf; MPI_Request req[2]; MPI_Isend(&sendbuf, 1, MPI_INT, 1, 1, MPI_COMM_WORLD, &req[0]); MPI_Isend(&sendbuf, 1, MPI_INT, 2, 1, MPI_COMM_WORLD, &req[1]); MPI_Waitall(2, &req); is not valid! Two threads are also not allowed to concurrently send the same buffer. This proposal will allow such access. - PASSED!

MPI_Request_free bad advice to users – I personally think that MPI_Request_free is dangerous (especially in the context of threads) and does not provide much to the user. But we can’t get rid of it. … so let’s discourage users to use it! – PASSED!

Deprecate the C++ bindings – that’s funny, isn’t it? But look at the current C++ bindings, they’re nothing more then pimped C bindings and only create problems. Real C++ programmers would use Boot.MPI (which internally uses the C bindings ;)), right? – PASSED (even though I voted against it ;))

Something odd happened to New Predefined Datatypes. We found a small typo in the ticket (MPI_C_BOOL should be 1 instead of 4 bytes). However, it wasn’t small enough that we could just change it (the process doesn’t allow significant changes after the first vote). It was now voted in with this bug (I abstained after the heavy discussion though) and it’s also too late to file a new ticket to fix this bug. However, we will have an errata item that will clarify this. It might sound strange, but I’m very happy that we stick to our principles and don’t change anything without proper reviews (these reviews between the meetings where vendors could get user-feedback have influences tickets quite a lot in the past). But still PASSED!

For all tickets and votes, see MPI Forum votes!

I’m very satisfied with the way the Forum works (Bill Gropp is doing a great job with MPI-2.2), I hear about other standardization bodies and have to say that our rules seem very sophisticated. I think MPI-2.2 will be a nice new standard which is not only a bugfix but offers new opportunities to library developers and users (see the tickets above). We are also planning to have a book again (perhaps with an editorial comment addressing the issue in ticket 18 (MPI_C_BOOL)!