MPI | Torsten Hoefler's blog

Apr102009

The MPI Forum gathers momentum

We’re now convening since more than a year and we just finished the 9th meeting! On the way, we released the rather unspectacular MPI-2.1 at EuroPVM 2008 in Dublin (but hey, everything is in a single document now!) which didn’t really change anything.

Then, we decided to go for MPI-2.2 which might change something but doesn’t break anything! We’re still unsure if we allow ABI changes though. But MPI-2.2 will certainly be source-code compliant (so a recompile might be required – which seems not that bad to me). So the MPI-2.2 process is supposed to guarantee quality. We use the trac system here at IU to manage the changes. Each “ticket” represents a change which has to be reviewed unofficially by at least four members of the Forum. Then, it can be read in front of the whole Forum at any meeting. Then, we have a first and a second vote and each successful ticket has to pass both. At the end, we vote for the inclusion of each chapter in MPI-2.2. Each ticket must go through this procedure and only a single state change is allowed during each meeting. This gives the Forum and the public a long time (>8 months) to review the proposals carefully. We also require an (open-source) implementation of each proposed change.

We’re discussing MPI-2.2 since several meetings – but the last (April’09) meeting was an important milestone! Since we plan to release MPI-2.2 at this year’s EuroPVM, we had to close the door. This means effectively, all tickets that have not been read in this meeting are postponed to MPI-3. I think we did pretty well and we’re within our schedule.

Some tickets that I think are interesting are:

Add a local Reduction Function – this enables the user to use MPI reduction operations locally (without communication). This is very useful for library implementors (e.g., implementing new collective routines on top of MPI)

Regular (non-vector) version of MPI_Reduce_scatter – this addresses a kind of missing functionality. The current Reduce_scatter should be Reduce_scatterv … but it isn’t. Anyway, if you ever asked yourself why the heck should I use Reduce_scatter then think about parallel matrix multiplication!An example is attached to the ticket.

Add MPI_IN_PLACE option to Alltoall – nobody knows why this is not in MPI-2. I suppose that it seemed complicated to implement (an optimized implementation is indeed NP hard), but we have a simple (non-optimal, linear time) algorithm to do it. It’s attached to the ticket :).

Fix Scalability Issues in Graph Topology Interface – this is in my opinion the most interesting/important addition in MPI-2.2. The graph topology interface in MPI-2.1 is horribly broken in that every process needs to provide the *full* graph to the library (which even in sparse graphs leads to $\Omega(P)$ memory *per node*). I think we have an elegant fix that enables fully distributed specification of the graph as well as each node specifies its neighbors. This will be even more interesting in MPI-3, when we start to use the topology as communication context.

Extending MPI_COMM_CREATE to create several disjoint sub-communicators from an intracommunicator -Neat feature that allows you to create multiple communicators with a single call!

Add MPI_IN_PLACE option to Exscan – again, don’t know why this is missing. The rationale that is given is not convincing.
Define a new MPI_Count Datatype – MPI-2.1 can’t send more than 2^31 (=2 Mio) objects on 32-bit systems right now – we should fix that!
Add const Keyword to the C bindings – most discussed feature I guess 🙂 – I am not sure about the consequences yet, but it seems nice to me (so far).

Allow concurrent access to send buffer – most programmers probably did not know that this is illegal, but it certainly is. For example:

int sendbuf;

MPI_Request req[2];
MPI_Isend(&sendbuf, 1, MPI_INT, 1, 1, MPI_COMM_WORLD, &req[0]);

MPI_Isend(&sendbuf, 1, MPI_INT, 2, 1, MPI_COMM_WORLD, &req[1]);
MPI_Waitall(2, &req);

is not valid! Two threads are also not allowed to concurrently send the same buffer. This proposal will allow such access.
MPI_Request_free bad advice to users – I personally think that MPI_Request_free is dangerous (especially in the context of threads) and does not provide much to the user. But we can’t get rid of it. … so let’s discourage users to use it!

Deprecate the C++ bindings – that’s funny, isn’t it? But look at the current C++ bindings, they’re nothing more then pimped C bindings and only create problems. Real C++ programmers would use Boot.MPI (which internally uses the C bindings ;)), right?

We made also some progress regarding MPI-3 where we can add more complex features that might (!) change the interface (but not break backwards compatibility).So we voted on Nonblocking Collective Operations (#109 my hobbyhorse) – and it passed unanimously!

For all votes, see votes.

Feb162009

The Cisco Headquarters in San Jose

htorMPI • Travel

Now that I’ve been here multiple times, I thought I just have to try the thing they call “Cisco Burger” in their cafeteria :). So I got one and must say that it’s not better than most American burgers I had before (but what did I expect). Here’s a picture for completeness:

Dec172008

December MPI Forum

htorMPI

Today was the first official reading of my Nonblocking Collective Operation proposal for MPI-3. It was a bit to short but it went really good. There were lots of discussions about clarifying the text, but the semantics are mainly fixed now. It looks like everything can be fixed before the next meeting. A picture made by Rolf is here:

[click for full-size picture]
Now, when it gets interesting – I should probably start an MPI blog about new features for MPI-2.2 or MPI-3 :).

May22008

MPI Forum No. 3

htorMPI • Travel

Just came back from the third MPI Forum in Chicago. We finished MPI-2.1 (I guess) … mostly. I think we’ll just have to vote on it at the next meeting :). It’s good that this is finished and we have a nice single document mandating the newest MPI standard. We’ve also been able to fix many bugs (see the ML). This meeting was at the Microsoft location in Chicago … in the 23rd floor of a skyscraper downtown. Pretty neat … nearly. One day, I decided to take the stairs (to not get fat in America) to find the door on the 23rd floor locked, after 7 minutes and 552 stairs. Great … actually, all doors but the lobby were locked :-(. So I walked down again *hmpf*. I complained and was told that this is a security feature – how weird, the lobby exit is right next to the elevators, i.e., one can just exit at the lobby and use the elevator to get to any floor – security, huh? I guess this is typical American security, like everyone has to take his shoes off at the airport.

What’s wrong in this picture? Me or the Microsoft logo?

Mar132008

MPI Forum in Chicago

htorMPI • Travel

Just came back from the second official meeting. I guess I can claim to be a member now (IU has a vote which requires to attend the two last meetings). Not much to say (officially), but it was very very interesting and productive! I’m looking forward to the next one. Here a picture of the crowd (try to find me :)):

(taken by Erez Heba (Microsoft))

Jan172008

MPI Forum meeting in Chicago

htorMPI

I just came back form the MPI Forum meeting. It’s kind of cool … I would have never thought that I would ever drive with my own car to an MPI Forum meeting in Chicago – I just did! The meeting was pretty interesting even though most of the discussions were focuses on bugfixing the existing versions. Some of those discussions were extremely boring, for example to talk about 15 minutes about the meaning of “it” in a particular sentence, or the similarly long definition of a straw straw vote (a vote that did not mean anything) :). But many of those could be solved by taking the discussions offline to the interested subgroup.

I’m (as expected) in the collectives subgroup (as the “deputy” of the group chairman) and the Generalized Request subgroup (I’m not sure if this subgroup will live long because the solution to the problem is kind of trivial). I hope to see a Mini-MPI subgroup soon. It’s interesting to see how careful changes ar applied to this standard and I do really like the process so far. I hope it stays like this.

It is a pleasant 4.5 hrs drive from Bloomington to Chicago (took much longer on the way back because we were stuck in traffic for about an hour in Chicago). I realized two funny things about the US traffic system:

1) there are freeway exits on the left (an no pre-warning signs!!) – man, this is dangerous

2) there is a toll-road towards Chicago – the toll is 15 cents (really!). We paid with a $20 bill (nobody had coins) and the “cashier” was really angry that she had to walk to the office to get change for us *grin*. 15 cents … man.

Oct32007

EuroPVM

htorHPC • MPI • Travel

Some pictures from the conference …

I am talking about non-blocking collectives here – we had an interesting question/discussion session afterwards :).

Â The reception in the French Senate – pretty classy ;).

Not too much more going on so far (besides some “pub-events” :-))

Dec182006

HPL MPI/NBC Patch

htorHPC • MPI • Science Comment

Yes, it is as cryptic as it sounds :). I finally (after a couple of months) finished to merge Christian’s MPI_Bcast() patch with my LibNBC patch to enable the usage of non-blocking collectives in HPL. The performance has to be investigated in more detail, LibNBC will probably provide benefits if lookahead is used. But this needs clearly more investigation. I posted it on the webpage because some people asked me to publish it … so feel free to play around with the patches. I’d also be happy to receive any feedback (yes, even bugs :)).

http://www.unixer.de/research/nbcoll/hpl/

The MPI_Bcast patch seems to break several MPI libraries (e.g. Open MPI (not the trunk 🙂 and MPICH2) because it uses really huge datatypes. It works with MVAPICH and newer Open MPI trunk versions.