IPDPS’09 report

I’m just back from IPDPS 2009. Overall, it was a nice conference, some ups and downs included as usual. I had several papers at workshops from which I had to present three (I was planning on two only, but one of my co-authors fell sick and couldn’t attend). They were all very well received (better than I hoped/expected).

I’m attending the CAC workshop since several years and have been surprised pleasantly each year. It only has high-quality papers and about 50% acceptance rate (be very careful with this metric, some of the best conferences in CS have a very high rate ;)). This year’s program was nicely laid out. The keynote speaker, Wu Feng, presented his view on green computing, and my talk was next. It was a perfect fit — Wu pretty much asked for more data, and I presented the data of our (purely empirical) study. My other talk presented the work on NBC of the group in Aachen – nicely done, I like the idea with the Hamiltonian path numbering but am wondering if one could do better (suggestions for a proof idea are welcome!).

Some talks were remarkable: Ashild’s talk about “Deadlock-Free Reconfiguration” was very interesting for me. Brice’s talk about “Decoupling Memory Pinning from the Application” reminded me a bit of the pipelined protocol in Open MPI, I’m not sure if I like it or not because it seems to hinder overlapping of computation and communication. The last talk about improving the RDMA-based eager protocol is a hybrid between eager and rendezvous for often-used buffers (each buffer has a usage-count and is registered after some number of uses). However, the empirical result data seemed to indicate that this only makes sense for larger buffers. And I agree to D.K. Panda’s comment that one could just decreases the protocol switching point for all considered applications. However, the idea could be very interesting for some applications with varying buffer usage.

It was in Rome this year and I don’t like Rome. I think it’s the dirtiest European city I know, and I had to stay for a week. The catering at IPDPS was bad as usual (only not-so-good cookies in coffee breaks and a unspectacular dinner). But I wasn’t there for the food anyway.

The main track was ok. I didn’t agree with some of the best paper selections. The OS jitter talk was interesting and contained some new data, however, it wasn’t clear what the new fundamental findings were. I suppose I have to read the paper. Some other theoretical papers seemed interesting, but I also need to read the articles. The panel was nice, I mostly agreed to Prof. Kale who stated that caches are getting much less important and Prof. Pingali who wants to consider locality. I seriously wonder what happened to all those dataflow architectures – I think they are a worthwile alternative to multicore systems. I was following Nir Shavit’s activities already, and I liked his keynote presentation about TM, even though there are obvious open problems.

Friday’s LSPP workshop was very interesting too. I’ve been the second year in this workshop and like it a lot (large-scale processing seems to gain importance). I enjoyed Abhinav’s talk who perfectly motivated my talk (it was right after his) and I enjoyed the lively discussion during and after my talk (sorry for delaying the schedule). I’m also happy to see that there is now an asynchronous profiling layer for the cell messaging layer (mini-cell-MPI).

I did not enjoy the flight back … Italy is awful (train runs late, airport was overcrowded and super-slow, boarding was a catastrophe because I was on the waiting list until 5 minutes before departure, …). But I was able to upgrade to first class in the US so that my last flight was at least comfortable. Here are some pictures from a five hour walk through Rome. We didn’t really pay attention because we were busy chatting :):

spanish_steps
The Spanish steps (don’t ask … it was on the map).

river
Some random river …

me
Yep, I was there (we think it’s the Vatican in the background).

collosseum
That’s simple — the collosseum (and some arch).

balcony
The view from my hotel. I couldn’t stay in the conference hotel because it was overbooked. I wasn’t mad because this one was significantly cheaper and nicer :).

Commencement (finally)

Yes, I had my commencement yesterday. I know, I got the Ph.D. six months ago, however, I didn’t have the time to visit last year’s commencement but still wanted to do it. I didn’t know what it would be like but it’s actually kind of fun (my roommate said “everybody looks like in Harry Potter” – and she was right). I am now officially endorsed (by the President of Indiana University) to carry the title “Doctor of Philosophy in Computer Science”. It’s funny, my advisor Prof. Lumsdaine had the honor to “hood” me officially (he montaged the big hood thing on my back which apparently is the sign of a Ph.D.).

Here’s a picture of my advisor and me in the ceremony’s apparel:
front

The best part (the doctoral hood in cream&crimson (IU’s white&red)) is unfortunately on the back:
back

The quintessence is that it’s much much cooler than in Germany where I received my Diplom (cf. Masters) in an old office from the secretary. Even winning the Best Student Award was much less spectacular (and I had to bring my own clothes). Here is a picture from the award ceremony – on the right is the Chancellor (cf. President) of TU Chemnitz:
rektor_small

I’ll try to get a picture from IU’s president in full apparel :).

Fun with the N810’s GPS

I never really used the GPS feature of the N810. It sucked badly when I got it, but it seems to be ok now. So I tried to record a walk from my home to the next grocery store – and it worked like a charm (ok, the fix took quite a while but that’s ok. I used MaemoMapper to record the data and was even able to visualize the route:
krogerwalk

I just say: sweet! — something more to play with ;).

Some random facts:
Length: 2.21 miles
Vertical up: 1171.3 ft
Vertical down: 1154.9 ft

The MPI Forum gathers momentum

We’re now convening since more than a year and we just finished the 9th meeting! On the way, we released the rather unspectacular MPI-2.1 at EuroPVM 2008 in Dublin (but hey, everything is in a single document now!) which didn’t really change anything.

Then, we decided to go for MPI-2.2 which might change something but doesn’t break anything! We’re still unsure if we allow ABI changes though. But MPI-2.2 will certainly be source-code compliant (so a recompile might be required – which seems not that bad to me). So the MPI-2.2 process is supposed to guarantee quality. We use the trac system here at IU to manage the changes. Each “ticket” represents a change which has to be reviewed unofficially by at least four members of the Forum. Then, it can be read in front of the whole Forum at any meeting. Then, we have a first and a second vote and each successful ticket has to pass both. At the end, we vote for the inclusion of each chapter in MPI-2.2. Each ticket must go through this procedure and only a single state change is allowed during each meeting. This gives the Forum and the public a long time (>8 months) to review the proposals carefully. We also require an (open-source) implementation of each proposed change.

We’re discussing MPI-2.2 since several meetings – but the last (April’09) meeting was an important milestone! Since we plan to release MPI-2.2 at this year’s EuroPVM, we had to close the door. This means effectively, all tickets that have not been read in this meeting are postponed to MPI-3. I think we did pretty well and we’re within our schedule.

Some tickets that I think are interesting are:

Add a local Reduction Function – this enables the user to use MPI reduction operations locally (without communication). This is very useful for library implementors (e.g., implementing new collective routines on top of MPI)


Regular (non-vector) version of MPI_Reduce_scatter – this addresses a kind of missing functionality. The current Reduce_scatter should be Reduce_scatterv … but it isn’t. Anyway, if you ever asked yourself why the heck should I use Reduce_scatter then think about parallel matrix multiplication!An example is attached to the ticket.

Add MPI_IN_PLACE option to Alltoall – nobody knows why this is not in MPI-2. I suppose that it seemed complicated to implement (an optimized implementation is indeed NP hard), but we have a simple (non-optimal, linear time) algorithm to do it. It’s attached to the ticket :).

Fix Scalability Issues in Graph Topology Interface
– this is in my opinion the most interesting/important addition in MPI-2.2. The graph topology interface in MPI-2.1 is horribly broken in that every process needs to provide the *full* graph to the library (which even in sparse graphs leads to $\Omega(P)$ memory *per node*). I think we have an elegant fix that enables fully distributed specification of the graph as well as each node specifies its neighbors. This will be even more interesting in MPI-3, when we start to use the topology as communication context.

Extending MPI_COMM_CREATE to create several disjoint sub-communicators from an intracommunicator -Neat feature that allows you to create multiple communicators with a single call!

Add MPI_IN_PLACE option to Exscan – again, don’t know why this is missing. The rationale that is given is not convincing.
Define a new MPI_Count Datatype – MPI-2.1 can’t send more than 2^31 (=2 Mio) objects on 32-bit systems right now – we should fix that!
Add const Keyword to the C bindings – most discussed feature I guess 🙂 – I am not sure about the consequences yet, but it seems nice to me (so far).

Allow concurrent access to send buffer – most programmers probably did not know that this is illegal, but it certainly is. For example:

int sendbuf;

MPI_Request req[2];
MPI_Isend(&sendbuf, 1, MPI_INT, 1, 1, MPI_COMM_WORLD, &req[0]);

MPI_Isend(&sendbuf, 1, MPI_INT, 2, 1, MPI_COMM_WORLD, &req[1]);
MPI_Waitall(2, &req);

is not valid! Two threads are also not allowed to concurrently send the same buffer. This proposal will allow such access.
MPI_Request_free bad advice to users – I personally think that MPI_Request_free is dangerous (especially in the context of threads) and does not provide much to the user. But we can’t get rid of it. … so let’s discourage users to use it!

Deprecate the C++ bindings – that’s funny, isn’t it? But look at the current C++ bindings, they’re nothing more then pimped C bindings and only create problems. Real C++ programmers would use Boot.MPI (which internally uses the C bindings ;)), right?

We made also some progress regarding MPI-3 where we can add more complex features that might (!) change the interface (but not break backwards compatibility).So we voted on Nonblocking Collective Operations (#109 my hobbyhorse) – and it passed unanimously!

For all votes, see votes.

Oh man, National car rental is so cool ….

They had a Mustang convertible for me … how nice. Some pictures below … I love this car.

Isn’t it sweet?

And it was just the right weather to go topless *yay*.

It had even a nicer steering wheel than the other Mustangs – and you should hear the sound. Oh man, a car has to sound nice and deep and manly (not like Donald Duck – some know of you what I mean).

and it also looks kinda aggressive 😉

The Cisco Headquarters in San Jose

Now that I’ve been here multiple times, I thought I just have to try the thing they call “Cisco Burger” in their cafeteria :). So I got one and must say that it’s not better than most American burgers I had before (but what did I expect). Here’s a picture for completeness:

First Class to San Francisco

Yeah, I tried to save money for the lab and booked the cheapest flight from Indianapolis to San Francisco to attend the nth MPI Forum. It was only $120, but really really stupid. The flight consisted of two legs, Indianapolis ti Philadelphia (YES PHL! East coast *hmpf*) and then Philadelphia to San Francisco (>6 hrs). Oh man, I didn’t realize this when I booked. My colleague took a more expensive direct flight which left one hour later while arriving three hours earlier :). Ah, anyway – I got a complimentary first class upgrade on this flight – so it was awesome. Flying first class is actually better than working at home because there are people who serve you drinks and food (obviously as much as you want). The only missing thing was Internet – but anyway – it’s better than an office. I got a lot of work done and the flight clearly ended to quickly.

Just for documentation services, here is the lunch that US Airways served (not quite like in a restaurant but actually not bad). I was again updated recently and they had really excellent fish.

Me@Apple

Oh man, I would have never believed it. People who know me that I’d never do that voluntarily – but I’ve been at the Apple headquarters to get a very special present for a very special person. I also visited my good friend Doug who now works at Apple :). Here are some proof-pictures that I’ve really been there. It’s not worse than me@Microsoft I guess ;).

1 Infinite Loop  is their actual address 🙂

the headquarters

December MPI Forum

Today was the first official reading of my Nonblocking Collective Operation proposal for MPI-3. It was a bit to short but it went really good. There were lots of discussions about clarifying the text, but the semantics are mainly fixed now. It looks like everything can be fixed before the next meeting. A picture made by Rolf is here:

[click for full-size picture]
Now, when it gets interesting – I should probably start an MPI blog about new features for MPI-2.2 or MPI-3 :).

Cluster Challenge 2008 – an adviser’s perspective

I did the Cluster Challenge again this year. Last year was fun, but this year was better – we won! Here’s the story:

We started Saturday morning in Bloomington. The travel went pretty
smooth and Guido picked us up at the airport in Austin. We went directly
to the Conference location. The time before that was very stressful
because the machine wasn’t working quite as nice as we would like to.
The biggest problem was that the we could not change the CPU frequency
of the new iDataplex system. However, we were able to change it on the
older version and saw significant gains. We benchmarked that we could
run 16 nodes during the challenge and use 12 of them for HPCC (while 4
were idle) with our power constraints (2×13 A). So we convinced IBM to
give us a pre-release BIOS update which was supposed to enable CPU
frequency scaling. And it looked good! We were able to reduce the CPU
clock to 2.0 GHz (as on the older systems). However, it was 4am and we
had to ship at 6, so we didn’t have the time to test more. But back to
Austin …

The guys from Dresden were already waiting for us because the organizers
did not allow them to unpack the cluster alone (it was supposed to be a
team effort). We unpacked our huge box and rolled our 900 pound cluster into
our booth.

Our Cluster

We spent the remaining day with installing the system and pimping (;-))
our booth. It went pretty well. Then we began to deploy our hardware and
boot it from USB to do some performance benchmarks.

Installing the fragile Fiber Myrinet equipment (we didn’t break anything!)

We started with HPCC and were shocked twice. Number one was that the CPU
scaling that costed us so many sleepless night did not seem to help. All
tools and /proc/cpuinfo showed 2.0 GHz – but the power consumption was
still as high as with 2.5 GHz. So we wrote a small RDTSC benchmark to
check the CPU frequency – it still ran at 2.5 GHz. The BIOS was lying to
us :-(.  The second shock was that HPL was twice as slow as it should
be. So much to the sleep …
Quite some time after midnight … still hacking on stuff. I’m trying to motivate (I am a good slave driver) our guys to go on.

The students tried to fix it … all night long. The conclusion was that
we had to drop our cool booting from USB idea due to the performance
loss. Later, it turned out that shared memory communication uses /tmp
(which was mounted via NFS) and was thus really slow (WTF!). Anyway, we
decided about one hour before the challenge started to fall back to
disks. This worked.

How high can one stack harddrives? Not too high actually ;). Man, this was hard to plug them back into the system.

The second problem was a tough one. The BIOS … lying to us. We were
finally able to get hold of an engineer from IBM. He tried hard but
couldn’t help us either. So the students had to make the hard decision
to run with two nodes less :-(.

In the meantime, me and Bob had fun while biking in order to power
laptops ;).

Bob Beck (UA) generating power on our fancy machine ;).

I was driving my laptop with the sandwiches I ate before :).

The Challenge finally starts

The challenge was about to start, the advisors couldn’t do anything
anymore, so we decided to get some fuel from the opening banquet for our
students in the nightshift ;).
Guido and me thinking about getting some good stuff for the students!

We finally found some good stuff on the showfloor *yay*.Advisor’s success!

Some of us were not totally up to speed all the time 😉 – It looks like somebody missed the start:

So the Challenge ran, and we had nothing to do (especially the advisers
who were just hanging around to feed and motivate the students). So we
did all kinds of weird things over night – and we had a bike ;).

I also started some coding during the challenge because I didn’t really
do anything but it was way to noisy to work on papers. I had to pose
inside the microsoft booth, while my laptop finished up some cool
things! Thanks to Erez for taking the picture at exactly the right time.

Some Linux-based “research” performed/finished inside the Microsoft booth.
Guido explains Vampir to the other teams on one of our three ultra-cool
41” displays (again, around midinght ;)). We had really nice speakers
at the challenge. Especially on Sunday, when all the others left, we
cranked them up and listened to the soundtrack of Black Hawk down. The
security guys seemed kind of confused to hear really loud base at 4am in
the morning ;).

Guido! Don’t help the “enemies” ;).

Youtube made it also on our display :). And nearly costed us a point by
disturbing the sound output of our power warning system. But Jens
realized it fortunately.

Watch yourself:  Achmed the Dead Terrorist

Oh, and there was this Novell penguin that spontaneously caught fire. I
guess this happens when experienced computer scientists spend two days
to install a completely retarded operating system (with InfiniBand – ask
me about details if you’re interested). I love Linux, but it’s a shame
that the abbreviation SLES has the word “Linux” in it. Debian or Ubuntu
is so much better! But apparently, SLES is better prepared for the
applications (clearly not for administration or software maintenance
though).

Each booth was “armed” with at least one student during all the time.
Here are some images from after midnight.

The MIT booth – doesn’t it look more like Stonybrook?

The folks from Arizona State – they had a neat Cray – with Windows though. But it seems that it worked for them.

The guys from Colorado with Aspen systems (don’t ask them about their vendor partner).

The National Tsing Hua University – excellent people but their system was more of a jet engine than a cluster.

Our booth … note the image on the big screen ;).

The Alberta folks – last year’s champions. Darn good hackers!

Purdue with their SciCortex – they seemed rather annoyed all the time.

Our social corner: At 2am, most students didn’t have to do a lot (just
watching jobs). So they all gathered in front of our booth and played
cards :).

Two fluffy spectators were watching our oscilloscope animation during
the show-off on Thursday.

The team of judges, led by Jack Dongarra, talked to our students to
assess their abilities

After that, we won! We don’t have a picture of our fabulous win yet, but I’ll post it with some more links after I got it.