Teaching CS498 at UIUC

I have been appointed as Adjunct Assistant Professor in Computer Science at UIUC since last year. My first service to the CS department is teaching the lecture CS498 “Hot Topics in HPC: Networks and Fault Tolerance” together with Franck Cappello this semester. It is the first class that I teach in the US academic system and it is a bit different from what we used to do in Germany. Classes are generally smaller (I have around 20-25 students, some of which do not take credit) which enables a more interactive teaching style. In my networking part, I start with focusing on the theoretical foundations and models for communication and then show practical examples for each of them and how the knowledge can help practical setting. I think it’s most important to understand the basics (this is also harder to learn and understand from textbooks than the technical details) before diving into practical networks. Teaching this class has been a lot of fun so far even though the preparations are really weekend-time consuming. I really enjoy the interactions with the students during class.


I’m teaching the class every Wed and Fri 9:30-10:45am in Siebel 1103. If anyone is interested in the content, check out the class webpage at http://www.unixer.de/CS498/ .

SC10 Best Paper

Yes, we received the SC10 Best Paper Award for our paper “Characterizing the Influence of System Noise on Large-Scale Applications by Simulation”. Congratulations also to Timo and Andrew! SC10 is the premier international venue for HPC research and development. Only 50 of the 253 submitted papers have been accepted at SC10 and it was very nice to hear that our paper was one of the best paper nominees (each track nominated a best paper), but I didn’t expect that we would be best of all the nominees! The final decision was made after the presentations. My talk was in a way too small room which was completely packed (people were standing at the wall and sitting on the floor in the aisle). The room was “allowed to” host 150 people (sign at the wall) but there were at least 250 in there :-). Glad that there was no firefighter around. Well, the air got rather bad after ten minutes ;-). The talk itself went extremely well, I was right in time and the audience had a lively discussion that I merely moderated (many questions trying to pinpoint flaws were actually answered by the audience :-)). That was really enjoyable.


Generally, I really enjoyed SC this year, I have had so many meetings that I was barely able to check the show-floor for goodies. New Orleans was also great (well, my hotel was, let’s say “suboptimal”, but it was very cheap). I’m looking forward to next year!

Hot Interconnects 2010 and Tutorials at Google

This year’s Hot Interconnects Conference was very special. Not only was it at Google but I was in the committee as tutorials chair. The conference was very good and I really enjoyed the keynotes and the invited talks on Exascale interconnects and the many conversations I had. The tutorials also went very well. Here are some impressions:

Raj Jain’s Future Internet tutorials.

A Google bike, it looks even funnier when I rode it ;-).

The facility.

LSAP’10 (HPDC’10) + Argonne Visit

This week, I attended the Workshop on Large Scale Application Performance in Chicago. I was shocked when I arrived at HPDC (sidenote, I took the train again and it was great!): everything seemed to be about Cloud or Grid or a combination of those (+Life Science). I still don’t fully understand what all this stuff is about and what the fundamental scientific problems are. Well, the workshop was very good! I really enjoyed Barton Miller’s keynote about his MRNet research. It’s good work! I also enjoyed listening to the other workshop papers. My talk went really well (I was a bit over-time but that wasn’t bad). We (Timo, Andrew and me) even got the best paper award for our work! That was nice and unexpected.

All-in-all, it was a really good workshop!

I visited Argonne the next day and gave a talk about Next Generation Collective Operations. This was also very entertaining and it was great to be there. I had some really good conversations with some folks. Thanks for inviting me! I completely forgot to take a picture …

AMP’10 and SC’10 PC meeting

This weekend, I attended the Advances in Message Passing workshop and the Supercomputing 2010 PC meeting. AMP was in Toronto and the SC meeting in New Orleans. Well, and the schedule was suboptimal. I had to leave AMP early and catch the last flight from Toronto to New Orleans (7pm). But AMP was clearly worth it! It had a couple of very interesting papers and our own contribution fir very well too! I gave the talk together with Jeremiah (which was an experiment ;-)) — and it was a complete success!
The hotel was also funny, it was inside a shopping mall, here’s the view “outside” the window:
Too bad that we had to leave early. Btw., I spent less than 23 hours in Toronto … and two of the in the US immigration! Yes, the US immigration is *in Toronto* (wtf!). And of course, we didn’t arrive two hours early at the airport … man, catching the plane was really close (they delayed it by 20 minutes because we were not the only ones who had that problem). This is really weird …

The SC meeting was very nice. I met many friends and colleagues and had many good discussions. New Orleans is not really the nicest place I have been to. I tried to save money and stayed in the “Bourbon Inn” — well, on Bourbon Street. I did not know what Bourbon street meant when I booked the hotel :-/. Well, it meant no sleep until around 3am :-(. The street is full of night- and stripclubs … kind of odd when you walk back to the hotel after a full day of meetings. Well, I survived (and saved $150). Also, taking the bus to the airport was an adventure. I also survived this one :-). Bourbon street (the camera didn’t really work because it was *extremely* humid!).

Late Post: IPDPS’10 PC Meeting

I guess I have to mention the craziest PC meeting I attended so far: IPDPS’10. Cindy Phillips, the PC chair scheduled the meeting for Friday 12/6/09 in Albuquerque (a while ago). We met at 6:30am at the hotel (I flew in late the day before and didn’t get too much sleep … which wasn’t really too good. The meeting had only minimal breaks scheduled (10 mins breakfast, 30 mins lunch, etc.) and we started out very slow with the good papers (and spent way too much time on clear accepts – as usual :-)). Well, the conference also received a record-submission of more than 500 papers … my track (Software) was the heaviest. The meeting was very professionally managed by Cindy, good job! However, the number of submissions was just overwhelming. If we assume a 9 hour meeting, we would have about 1 minute per paper which seems very unrealistic. We spent on average more than three minutes and sometimes much longer. Well, it was evening by the time we were down to the complicated “middle-field”. All in all, the meeting took longer than 17 hours and was extremely exhausting. I think we made good decisions and selected a good program. It should have been a two-day meeting though ;-).

However, IEEE made up for my stress and gave me a certificate of appreciation :-). See

Welcome to Blue Drop!

We got our POWER7 780 (MR) System on Friday and I just logged in :-). I’m alone on something like this:

htor@bd01:~> w
 20:49:12 up 1 day,  1:01,  1 user,  load average: 0.00, 0.00, 0.00
htor     pts/0     20:35    0.00s  0.04s  0.01s w
htor@bd01:~> cat /proc/cpuinfo 
processor       : 0
cpu             : POWER7 (architected), altivec supported
clock           : 3864.000000MHz
revision        : 2.1 (pvr 003f 0201)

processor       : 1
cpu             : POWER7 (architected), altivec supported
clock           : 3864.000000MHz
revision        : 2.1 (pvr 003f 0201)
processor       : 127
cpu             : POWER7 (architected), altivec supported
clock           : 3864.000000MHz
revision        : 2.1 (pvr 003f 0201)

timebase        : 512000000
platform        : pSeries
model           : IBM,9179-MHB
machine         : CHRP IBM,9179-MHB

It’s SMT=4 though (so “only” 32 cores) but with sweet 128 GiB memory. With 4 FMAs per cycle, that’s 983.04 GF/s, nearly one TF in a single “rack” (that thing below)!

I particularly like the name … we all hope that the drop will quickly fill up a bucket full of water(s) :-).

(credits: Steve Kleinvehn, original)

PPoPP 2010

The reason for my travel to India was attending the PPoPP conference which was held in conjunction with HPCA at the Indian Institute of Science in Bengaluru. Most keynotes and the opening session was shared between the two conferences and I really liked the concept of having a hardware and a programming conference jointly. It induced some interesting discussions for me. They even got a former Indian president to speak at the opening ceremony, here’s an impression:

The PPoPP program was assembled by Mary Hall who had the interesting task of assigning the reviews for a record submission (50% more than the previous year). Here are some statistics on Mary’s slide:

The conference was excellent, actually (one of?) the best conferences that I attended so far. I liked all papers (one was borderline but still ok) which is pretty rare. All in all, just excellent!

The food was great as I said before and the social event was held at an old castle (which was oddly only reachable through a terrible dirt road … and heavily guarded again). Here are some impressions from the castle and the evening program:

That’s it, a short and good conference! I also liked the single-track layout of the talks (even though some sessions were really long ;-)).

SC09 in Portland

Yes, still stuck in Rainland. This year’s SC was the best I ever attended to be honest. It felt like I know everybody and I was invited to two to three parties every evening (awesome). My work was presented at the MPI Forum BoF and the FASTOS BoF. I was also invited to present parts of my MPI Forum work at the MPICH2 BoF which was great! However, I fell sick on Wednesday and felt really bad on Thursday and much worse on Friday (which made me miss the morning panel and the SC10 committee meeting).

And the two most exciting happenings:

  1. I entered two drawings and won twice! (an X-box Elite from Microsoft *YAY* (thanks Fab!) and some USB stick from teragrid (I hope they’ll send it to me))
  2. I was complaining about the low threading support (only four) in the Power 7 and the random guy next to me started to explain why. It turned out that this guy was Burton Smith! He entered my personal hall of fame after his keynote at SPAA 2008. This man knows exactly what he talks about and we chatted a long time (until the show closed) about network topologies and routing. It was surprising to me that he mentioned the same fundamental insight in topologies that I had about a month ago independently. He also studied Cayley graphs and friends … five years ago (D’oh, I’m too young!).

November MPI Forum in Portland

They should have called the place “Rainland” but ok, I brought an umbrella 🙂 .

This week’s MPI Forum was very interesting! Marc Snir presented his convincing hybrid proposal. It’s really nice and orthogonal to the current standard. It needs some minor polishing and an implementation and seems ready to go.

We had some incremental discussions in the collectives working group but nothing very exciting. I think it is time to look for applications/systems that can benefit from the sparse collective proposal. Sameer Kumar sent me a very interesting paper which seems to be what we need! We also assimilated the Topology chapter into the collectives working group (now called collectives and topology working group — short colltop 🙂 ).

The RMA discussions were helpful this time and motivated me to summarize all ideas that floated in my head into a patch to the MPI-2.2 standard document during the weekend. I’ll post it to the RMA list and will see what happens. I think the RMA interface in MPI-2.0 is rather elegant and only needs some minor tweaks and some semantic changes to make it useful.

The MPI-3 discussions were going in circles (again). We went back and forth if we should call our next release (which contains nonblocking collectives and probably support for hybrid environments) MPI 2.3 or MPI 3.0 draft. We didn’t come to any conclusion. The only decision we made (I think, we didn’t vote though) is that we don’t want to break source compatibility in the next revision yet. I’d like to call it 2.3 then because having a 3.0 draft means that 3.0 will be a similar release and we would probably break compatibility in 3.1 which doesn’t seem to useful. 2.3 also gives the user a better impression that it’s not a revolutionary new thing (e.g., fault tolerant). However, I don’t have a too strng opinion, I just have some users who want nonblocking collectives in a release that is at least source compatible.

Another really annoying thing is the whole MPI_Count story. I have to admit that I was in favor of it at the beginning because abstraction seems right to me, however, I am now really against it due to several reasons: (1) the workaround is trivial and causes negligible overhead, (2) it breaks source compatibility which is a total no-go, and (3) it causes all kinds of Fortran 77 problems (it seems that this is the reason why int was selected in the first place). Could we just withdraw the ticket please?