IPDPS’09 report

I’m just back from IPDPS 2009. Overall, it was a nice conference, some ups and downs included as usual. I had several papers at workshops from which I had to present three (I was planning on two only, but one of my co-authors fell sick and couldn’t attend). They were all very well received (better than I hoped/expected).

I’m attending the CAC workshop since several years and have been surprised pleasantly each year. It only has high-quality papers and about 50% acceptance rate (be very careful with this metric, some of the best conferences in CS have a very high rate ;)). This year’s program was nicely laid out. The keynote speaker, Wu Feng, presented his view on green computing, and my talk was next. It was a perfect fit — Wu pretty much asked for more data, and I presented the data of our (purely empirical) study. My other talk presented the work on NBC of the group in Aachen – nicely done, I like the idea with the Hamiltonian path numbering but am wondering if one could do better (suggestions for a proof idea are welcome!).

Some talks were remarkable: Ashild’s talk about “Deadlock-Free Reconfiguration” was very interesting for me. Brice’s talk about “Decoupling Memory Pinning from the Application” reminded me a bit of the pipelined protocol in Open MPI, I’m not sure if I like it or not because it seems to hinder overlapping of computation and communication. The last talk about improving the RDMA-based eager protocol is a hybrid between eager and rendezvous for often-used buffers (each buffer has a usage-count and is registered after some number of uses). However, the empirical result data seemed to indicate that this only makes sense for larger buffers. And I agree to D.K. Panda’s comment that one could just decreases the protocol switching point for all considered applications. However, the idea could be very interesting for some applications with varying buffer usage.

It was in Rome this year and I don’t like Rome. I think it’s the dirtiest European city I know, and I had to stay for a week. The catering at IPDPS was bad as usual (only not-so-good cookies in coffee breaks and a unspectacular dinner). But I wasn’t there for the food anyway.

The main track was ok. I didn’t agree with some of the best paper selections. The OS jitter talk was interesting and contained some new data, however, it wasn’t clear what the new fundamental findings were. I suppose I have to read the paper. Some other theoretical papers seemed interesting, but I also need to read the articles. The panel was nice, I mostly agreed to Prof. Kale who stated that caches are getting much less important and Prof. Pingali who wants to consider locality. I seriously wonder what happened to all those dataflow architectures – I think they are a worthwile alternative to multicore systems. I was following Nir Shavit’s activities already, and I liked his keynote presentation about TM, even though there are obvious open problems.

Friday’s LSPP workshop was very interesting too. I’ve been the second year in this workshop and like it a lot (large-scale processing seems to gain importance). I enjoyed Abhinav’s talk who perfectly motivated my talk (it was right after his) and I enjoyed the lively discussion during and after my talk (sorry for delaying the schedule). I’m also happy to see that there is now an asynchronous profiling layer for the cell messaging layer (mini-cell-MPI).

I did not enjoy the flight back … Italy is awful (train runs late, airport was overcrowded and super-slow, boarding was a catastrophe because I was on the waiting list until 5 minutes before departure, …). But I was able to upgrade to first class in the US so that my last flight was at least comfortable. Here are some pictures from a five hour walk through Rome. We didn’t really pay attention because we were busy chatting :):

spanish_steps
The Spanish steps (don’t ask … it was on the map).

river
Some random river …

me
Yep, I was there (we think it’s the Vatican in the background).

collosseum
That’s simple — the collosseum (and some arch).

balcony
The view from my hotel. I couldn’t stay in the conference hotel because it was overbooked. I wasn’t mad because this one was significantly cheaper and nicer :).

Commencement (finally)

Yes, I had my commencement yesterday. I know, I got the Ph.D. six months ago, however, I didn’t have the time to visit last year’s commencement but still wanted to do it. I didn’t know what it would be like but it’s actually kind of fun (my roommate said “everybody looks like in Harry Potter” – and she was right). I am now officially endorsed (by the President of Indiana University) to carry the title “Doctor of Philosophy in Computer Science”. It’s funny, my advisor Prof. Lumsdaine had the honor to “hood” me officially (he montaged the big hood thing on my back which apparently is the sign of a Ph.D.).

Here’s a picture of my advisor and me in the ceremony’s apparel:
front

The best part (the doctoral hood in cream&crimson (IU’s white&red)) is unfortunately on the back:
back

The quintessence is that it’s much much cooler than in Germany where I received my Diplom (cf. Masters) in an old office from the secretary. Even winning the Best Student Award was much less spectacular (and I had to bring my own clothes). Here is a picture from the award ceremony – on the right is the Chancellor (cf. President) of TU Chemnitz:
rektor_small

I’ll try to get a picture from IU’s president in full apparel :).

Fun with the N810’s GPS

I never really used the GPS feature of the N810. It sucked badly when I got it, but it seems to be ok now. So I tried to record a walk from my home to the next grocery store – and it worked like a charm (ok, the fix took quite a while but that’s ok. I used MaemoMapper to record the data and was even able to visualize the route:
krogerwalk

I just say: sweet! — something more to play with ;).

Some random facts:
Length: 2.21 miles
Vertical up: 1171.3 ft
Vertical down: 1154.9 ft

The MPI Forum gathers momentum

We’re now convening since more than a year and we just finished the 9th meeting! On the way, we released the rather unspectacular MPI-2.1 at EuroPVM 2008 in Dublin (but hey, everything is in a single document now!) which didn’t really change anything.

Then, we decided to go for MPI-2.2 which might change something but doesn’t break anything! We’re still unsure if we allow ABI changes though. But MPI-2.2 will certainly be source-code compliant (so a recompile might be required – which seems not that bad to me). So the MPI-2.2 process is supposed to guarantee quality. We use the trac system here at IU to manage the changes. Each “ticket” represents a change which has to be reviewed unofficially by at least four members of the Forum. Then, it can be read in front of the whole Forum at any meeting. Then, we have a first and a second vote and each successful ticket has to pass both. At the end, we vote for the inclusion of each chapter in MPI-2.2. Each ticket must go through this procedure and only a single state change is allowed during each meeting. This gives the Forum and the public a long time (>8 months) to review the proposals carefully. We also require an (open-source) implementation of each proposed change.

We’re discussing MPI-2.2 since several meetings – but the last (April’09) meeting was an important milestone! Since we plan to release MPI-2.2 at this year’s EuroPVM, we had to close the door. This means effectively, all tickets that have not been read in this meeting are postponed to MPI-3. I think we did pretty well and we’re within our schedule.

Some tickets that I think are interesting are:

Add a local Reduction Function – this enables the user to use MPI reduction operations locally (without communication). This is very useful for library implementors (e.g., implementing new collective routines on top of MPI)


Regular (non-vector) version of MPI_Reduce_scatter – this addresses a kind of missing functionality. The current Reduce_scatter should be Reduce_scatterv … but it isn’t. Anyway, if you ever asked yourself why the heck should I use Reduce_scatter then think about parallel matrix multiplication!An example is attached to the ticket.

Add MPI_IN_PLACE option to Alltoall – nobody knows why this is not in MPI-2. I suppose that it seemed complicated to implement (an optimized implementation is indeed NP hard), but we have a simple (non-optimal, linear time) algorithm to do it. It’s attached to the ticket :).

Fix Scalability Issues in Graph Topology Interface
– this is in my opinion the most interesting/important addition in MPI-2.2. The graph topology interface in MPI-2.1 is horribly broken in that every process needs to provide the *full* graph to the library (which even in sparse graphs leads to $\Omega(P)$ memory *per node*). I think we have an elegant fix that enables fully distributed specification of the graph as well as each node specifies its neighbors. This will be even more interesting in MPI-3, when we start to use the topology as communication context.

Extending MPI_COMM_CREATE to create several disjoint sub-communicators from an intracommunicator -Neat feature that allows you to create multiple communicators with a single call!

Add MPI_IN_PLACE option to Exscan – again, don’t know why this is missing. The rationale that is given is not convincing.
Define a new MPI_Count Datatype – MPI-2.1 can’t send more than 2^31 (=2 Mio) objects on 32-bit systems right now – we should fix that!
Add const Keyword to the C bindings – most discussed feature I guess 🙂 – I am not sure about the consequences yet, but it seems nice to me (so far).

Allow concurrent access to send buffer – most programmers probably did not know that this is illegal, but it certainly is. For example:

int sendbuf;

MPI_Request req[2];
MPI_Isend(&sendbuf, 1, MPI_INT, 1, 1, MPI_COMM_WORLD, &req[0]);

MPI_Isend(&sendbuf, 1, MPI_INT, 2, 1, MPI_COMM_WORLD, &req[1]);
MPI_Waitall(2, &req);

is not valid! Two threads are also not allowed to concurrently send the same buffer. This proposal will allow such access.
MPI_Request_free bad advice to users – I personally think that MPI_Request_free is dangerous (especially in the context of threads) and does not provide much to the user. But we can’t get rid of it. … so let’s discourage users to use it!

Deprecate the C++ bindings – that’s funny, isn’t it? But look at the current C++ bindings, they’re nothing more then pimped C bindings and only create problems. Real C++ programmers would use Boot.MPI (which internally uses the C bindings ;)), right?

We made also some progress regarding MPI-3 where we can add more complex features that might (!) change the interface (but not break backwards compatibility).So we voted on Nonblocking Collective Operations (#109 my hobbyhorse) – and it passed unanimously!

For all votes, see votes.

Oh man, National car rental is so cool ….

They had a Mustang convertible for me … how nice. Some pictures below … I love this car.

Isn’t it sweet?

And it was just the right weather to go topless *yay*.

It had even a nicer steering wheel than the other Mustangs – and you should hear the sound. Oh man, a car has to sound nice and deep and manly (not like Donald Duck – some know of you what I mean).

and it also looks kinda aggressive 😉

The Cisco Headquarters in San Jose

Now that I’ve been here multiple times, I thought I just have to try the thing they call “Cisco Burger” in their cafeteria :). So I got one and must say that it’s not better than most American burgers I had before (but what did I expect). Here’s a picture for completeness:

First Class to San Francisco

Yeah, I tried to save money for the lab and booked the cheapest flight from Indianapolis to San Francisco to attend the nth MPI Forum. It was only $120, but really really stupid. The flight consisted of two legs, Indianapolis ti Philadelphia (YES PHL! East coast *hmpf*) and then Philadelphia to San Francisco (>6 hrs). Oh man, I didn’t realize this when I booked. My colleague took a more expensive direct flight which left one hour later while arriving three hours earlier :). Ah, anyway – I got a complimentary first class upgrade on this flight – so it was awesome. Flying first class is actually better than working at home because there are people who serve you drinks and food (obviously as much as you want). The only missing thing was Internet – but anyway – it’s better than an office. I got a lot of work done and the flight clearly ended to quickly.

Just for documentation services, here is the lunch that US Airways served (not quite like in a restaurant but actually not bad). I was again updated recently and they had really excellent fish.

Me@Apple

Oh man, I would have never believed it. People who know me that I’d never do that voluntarily – but I’ve been at the Apple headquarters to get a very special present for a very special person. I also visited my good friend Doug who now works at Apple :). Here are some proof-pictures that I’ve really been there. It’s not worse than me@Microsoft I guess ;).

1 Infinite Loop  is their actual address 🙂

the headquarters

December MPI Forum

Today was the first official reading of my Nonblocking Collective Operation proposal for MPI-3. It was a bit to short but it went really good. There were lots of discussions about clarifying the text, but the semantics are mainly fixed now. It looks like everything can be fixed before the next meeting. A picture made by Rolf is here:

[click for full-size picture]
Now, when it gets interesting – I should probably start an MPI blog about new features for MPI-2.2 or MPI-3 :).