Talk about Remote Memory Access at San Diego Supercomputing Center/UCSD

On Friday, I visited SDSC and UCSD in San Diego presenting on recent work around remote memory access programming in a joint CS/SDSC seminar.

I believe the paradigm reaches way beyond MPI (indeed, it doesn’t include messages at all and thus the name MPI is somewhat misleading). In the talk titled “Remote Memory Access Programming: Faster Parallel Computing Without Messages”, I discuss performance issues when programming cache-coherent shared memory systems and RMA as a potential solution. Then I went into quite some detail on MPI-3 RMA as an example and our recently proposed extension “Notified Access”. The slides are here:

I really enjoyed giving the talk to the mixed SDSC and UCSD/CS audience. The talk was early morning 9am and the remaining day was filled with 1-1 meetings with CS faculty members in the Systems and HPC area and several researchers at SDSC. I had many interesting discussions and learned a lot. Very nice meeting overall. Thanks to Mike Norman, Scott Baden, and Laura Carrington for arranging the visit!

In fact, I didn’t do all the work for the talk alone — I had lots of help from others:

IPDPS 2015 in Hyderabad, India

Last week, Roberto and I went to IPDPS where his paper was accepted. I was also invited to give a keynote at the HIPS/LSPP workhop as well as an invited talk at the PLC workshop.

Some impressions below:

We were staying in “real India” and had a nice and interesting 20 minute walk to the conference every morning.

This is why I am saying “real India” because the conference itself was not quite in India, well, physically yes but there were two fences of high fences and guards between it and outside India ;-).

Keynote at the HIPS/LSPP workshop on performance modeling. The slides are here:

Invited talk at the PLC workshop (on MODESTO, data-centric optimization of complex stencil codes). You can clearly see my standard pose :-). Slides are here:

The PLC audience, very well attended for a workshop.

The IPDPS plenary talk (we got a best paper award).

Actually, Roberto was supposed to give the talk but it would have been his first public talk. So he convinced me to do it but had to promise to give it back at ETH. I’m waiting Roberto :-)!!!

The audience (not visible well because nobody wanted to sit on the first rows, as usual 😉 ).

1st SPCL Barbequeue

We continued our tradition of celebrating past successes with a party at SPCL. This time, we had several best papers and some other wins, so we needed a party that would outgrow my apartment. Thus, we decided to occupy a hill nearby and have a back-to-nature barbequeue.

Some consumption statistics for 16 people (for future planning):

  • five bottles + 3 liters wine
  • 30 bottles of beer
  • 1 bottle brandy
  • nearly no consumption of non-alcoholic beverages (strange, we had 4l water and 9l juices)
  • 1.8 full-plate quiches
  • a bit Hummus / cucumbers etc.
  • 1/2 large bowl potato salad
  • 1 bowl of stick bread
  • 1 leaf of bread (we had three)
  • 12 burger patties, 20 sausages

Thanks to everybody who contributed! Here some impressions:

We started with stick-bread (a German tradition). The goal is to bake bred in the fire only using a stick and not to loose or burn it. Requires some skill.

Then we started the adventurous grill — nearly fully made out of wood (other occupied the bbq spot with the metal equipment.

The view was nice and the weather cooperated nicely.

Of course, it was only a matter of time until the wood construction caught fire …

Wood grill v2, a bit better (watch the additional support structures). And somehow the stickbread must have been inspiring.

(Parts of) the group.

Other parts were making contacts with the cows, who first ignored us …

… but soon learned that we had beer …

… and then seemed rather happy.

Making charcoal in the fire.

The night was the nicest with the campfire.

… the rain-front moved in 10:40-ish exactly as predicted by the weather app provided by MeteoSwiss. Very well done! Unfortunately, the multi-day forecast was not that great ;-).

How to meet a paper deadline

Science is all about producing knowledge and insights and communicating both to other scientists (or industry). The main medium of communication are papers, talks, and increasingly social media (twitter, blogs, etc.). The most important and impactful are still scientific papers but they can often be strengthened by the other communication media.

In computer science (CS), serious publication venues are almost always conferences that happen at particular times each year. These come with submission deadlines set in order to allow enough time for a review cycle. Such deadlines are strict, meaning that you’re either in or out. I personally believe that deadlines are a great way to accelerate research because they create a specific goal to work towards, ,wrap up and document results. However, the binary nature of deadlines can lead to frustration and requires careful planning to meet them. I’ll now summarize seven rules and techniques I learned (partially the hard way) while hunting hundreds of deadlines as a student, group leader, and professor together with my students.

1) plan early: Have a complete plan ready months before, it will change, but you need a plan. Start with an outline and milestones. Ask questions: What are the key points, how do I explain or show them? What experiments do I need? How long will they take? How do I communicate the idea most efficiently (think about analogies and good examples)? Of course, you need the key idea set at the beginning. I suggest starting to plan 2-4 months before the real deadline.

2) start writing immediately: As early as possible (while doing the research), write down everything. A good researcher always documents his ideas, thoughts, and experiments; he’s always writing. Distill the key points into a working draft. This draft is not wasted, it can be used to extract a conference publication and it can be published as a technical report to provide more information. You should always document what you do.

3) test early: In CS, experiments most likely require some code. While developing this code, test it. Test it in the final configuration. Do not rely on “I think it’s good” until it’s too late. Ideally, develop small regression tests. Always validate simulations and emulations at the beginning. You’ll need this anyway and you don’t want to run everything twice.

4) set a hard deadline: This is the most important point. You need a HARD deadline sufficiently long before the real deadline. You need to be absolutely serious about meeting this deadline at any cost, work through weekends and nights etc.. I’d recommend one or two weeks before the real deadline. This provides buffer and will reduce stress. Ideally, you’ll do nothing (or not much) between this deadline and the real deadline. This gives you the opportunity to make the paper great. In the worst case, you find a major problem and need to work through until the real deadline. Yet, this is less stressful than realizing 24 hours before the deadline that there is a major issue. Remember: set it, be serious, and stick to it at any cost.

5) take it serious: Meet your own deadline. Seriously. There is always a next deadline and working on something else but this hard task is always more attractive. But deadlines are often only once a year, missing them can have a serious impact on your career.

6) prioritize and tradeoff: It’s never possible to do everything you think of to perfection. So decide what is most important and set deadlines for milestones and in the worst case meet them by simplifying the goal. Never never never tradeoff scientific integrity!!

7) manage your collaborators: Keep them involved make them see your progress. Make sure they always know how they can help. Pull rather than push, i.e., show that you’re working hard and hope that their honor will drive them. Avoid collaborations where this shows no effect. Do not wait, work and help, minimize dependencies. I have seen cyclic waiting before. Agree on milestones and deadlines (including the hard one) in advance.

8) focus when it gets tight: If it looks like you may not be able to meet your own deadline (which is of course well in advance of the actual deadline) then focus. Cut everything non-essential such as group meetings, talks, chats, excuse yourself from teleconferences etc. (your peers will understand). I strive for a two-week advance personal deadline and begin to cut heavily when it gets tight three weeks before the actual deadline.

Planning is key and the main tools are milestones and self-set deadlines (to be taken seriously). You know that you failed if you have to work very hard the week or day before the deadline (you should of course always work hard, but voluntarily 🙂 ).

On the run to Baden

After probably the most stressful month that I have ever encountered (14 deadlines in five weeks, one really important one), I had to get out to do something fun. Well, I have only been to the alps three times since 2.5 years because it takes at least half a day, and I had to pick something more efficient this time as I don’t have too much time to spare. And living in Switzerland has the advantage that it’s likely that something nice it at your doorsteps :-). The decision was to run with a friend from Urdorf to Baden. Google maps says 14 kms, so it cannot be that bad (we guessed maximum 2 hours). Well, we didn’t calculate for the crazy paths in the woods and hills — it was far from straight and we ended up running 22.1 kms total.

Our route … many times, we had to back-track due to dead-ends. It didn’t help that the GPS in the cheap Android phone only worked for 30 minutes :-(.

Well, 22 kms seems like absolutely no problem but it was a total ascend of 2055 meters over nearly 8 kms followed by about the same descent also on 8 kms. So only 6 kms were flat. That was the tough part. The diagram shows the elevation, the axis is around 200 m.

Below some impressions of the snowy hills. It was around -3 degrees Celsius, but didn’t feel cold at all!

After one hour running, the promised sun shows up (kudos MeteoSwiss). The tracks were tough, all frozen snow, we fell both :-).

View from the first break, nice!

Most Swiss bunkers had nice icicles, makes a good defense I guess.

More steep paths … uff, many iced with abysses :-).

Finally in Baden (it was not easy to find with the broken GPS). And one last uprising to the old castle.

The remains of the old castle.

The view into the valley with the train station.

A new Promising Open Access Journal in HPC/Supercomputing!

The recent open-access journal movement is spreading quickly. It is indeed a very good idea to establish journals that are free to the whole community since the community does the research, the writing, and the refereeing while printed journal copies become less and less relevant. One such journal recently appeared to support the high-performance computing/supercomputing community: “Supercomputing Frontiers and Innovation”.

The journal’s leads are Jack Dongarra and Vladimir Voevodin and they are supported by a world-class editorial board (spoiler: I am on the board as well).

The first volume appeared in two parts: part one and part two. As one would expect from an open-access journal, one can download all articles and the whole journal as pdf. I am happy to have one of the limited-edition hard-copies of the second journal:

I published an overview of collective operation algorithms and analytic performance models for time and energy in this journal. It has been generally very pleasant to work with the staff and the open access guarantees quick and wide distribution without paywalls.

I read both issues with great interest and found the papers of very high quality. Superfri has a good chance to quickly emerge as a leading journal in high-performance computing. Submissions are open at

11 SPCL@ETH activities at SC14

The Intl. Supercomputing (SC) conference is clearly the main event in HPC. It’s program is broad and more than 10k people attend annually. SPCL is mainly focused on the technical program which makes SC the top-tier conference in HPC. It is the main conference of a major ACM SIG (SIGHPC).

This year, SPCL members co-authored three technical papers in the very competitive program with several thousand attendees! One was even nominated for the best paper award — and to take it upfront, we got it! Congrats Maciej! All talks were very well attended (more than 100 people in the room).

All of these talks were presented by collaborators, so I was hoping to be off the hook. Well, not quite, because I gave seven (7!) invited talks at various events and participated in teaching a full-day tutorial on advanced MPI. The highlight was a keynote at the LLVM workshop. I was also running around all the time because I co-organized the overall workshop program (with several thousand attendees) at SC14.

So let me share my experience of all these exciting events in chronological order!

1) Sunday: IA3 Workshop on Irregular Applications: Architectures & Algorithms

This workshop was very nice. Kicked off by top-class keynotes from Onur Mutlu (CMU) and Keshav Pingali (UT) through great paper talks and a panel in the afternoon. I served on the panel with some top-class people and it was a lot of fun!

Giving my panel presentation on accelerators for graph computing.

Arguing during the panel discussion (Hadoop right now) with (left to right): Keshav Pingali (UT Austin), John Shalf (Berkeley), me (ETH), Clayton Chandler (DOD), Benoit Dupont de Dinechin (Kalray), Onur Mutlu (CMU, Maya Gokhale (LLNL). A rather argumentative group :-).

My slides can be found here.

2) Monday – LLVM Workshop

It was long overdue to discuss the use of LLVM in the context of HPC. So thanks to Hal Finkel and Jeff Hammond for organizing this fantastic workshop! I kicked it off with some considerations about runtime-recompilation and how to improve codes.

The volunteers counted around 80 attendees in the room! Not too bad for a workshop. My slides on “A case for runtime recompilation in HPC” are here.

3) Monday – Advanced MPI Tutorial

Our tutorial attendee numbers keep growing! More than 67 people registered but it felt like more were showing up for the tutorial. We also released the new MPI books, especially the “Using Advanced MPI” book which shortly after became the top new release on Amazon in the parallel processing category.

4) Tuesday – Graph 500 BoF

There, I released the fourth Green Graph 500 list. Not much new happened on the list (same as for the Top500 and Graph500) but the BoF
was still fun! Peter Kogge presented some interesting views on the data of the list. My slides can be found here.

5) Tuesday – LLVM BoF

Concurrently with the Graph 500 BoF was the LLVM BoF, so I had to speak at both at the same time. Well, that didn’t go too well (I’m still only one person — apologies to Jim). I only made 20% of this BoF but it was great! Again, very good turnout, LLVM is certainly becoming more important every year. My slides are here.

6) Tuesday – Simulation BoF

There are many simulators in HPC! Often for different purposes but also sometimes for similar ones. We discussed how to collaborate and focus our efforts better. I represented LogGOPSim, SPCL’s discrete event simulator for parallel applications.

My talk summarized features and achievements and slides can be found here.

7) Tuesday – Paper Talk “Slim Fly: A Cost Effective Low-Diameter Network Topology”

Our paper was up for Best Student Paper and Maciej did a great job presenting it. But no need to explain, go and read it here!

Maciej presenting the paper! Well done.

8) Wednesday – PADAL BoF – Programming Abstractions for Data Locality

Programming has to become more data-centric as architectures evolve. This BoF followed an earlier workshop in Lugano on the same topic. It was great — no slides this time, just an open discussion! I hope I didn’t upset David Padua :-).

Didem Unat moderated and the panelists were — Paul Kelly (Imperial), Brad Chamberlain (Cray), Naoya Maruyama (TiTech), David Padua (UIUC), me (ETH), Michael Garland (NVIDIA). It was a truly lively BoF :-).

But hey, I just got it in writing from the Swiss that I’m not qualified to talk about this topic — bummer!

The room was packed and the participation was great. We didn’t get to the third question! I loved the education question, we need to change the way we teach parallel computing.

9) Wednesday – Paper Talk “Understanding the Effects of Communication and Coordination on Checkpointing at Scale”

Kurt Ferreira, a collaborator from Sandia was speaking on unexpected overheads of uncoordinated checkpointing analyzed using LogGOPSim (it’s a cool name!!). Go read the paper if you want to know more!

Kurt speaking.

10) Thursday – Paper Talk “Fail-in-Place Network Design: Interaction between Topology, Routing Algorithm and Failures”

Presented by Jens Domke, a collaborator from Tokyo Tech (now at TU Dresden). A nice analysis of what happens to a network when links or routers fail. Read about it here.

Jens speaking.

11) Thursday – Award Ceremony

Yes, somewhat unexpectedly, we go the best student paper award. The second major technical award in a row for SPCL (after last year’s best paper).

Happy :-).

Coverage by Michele @ HPC-CH and Rich @ insideHPC.

The MPI 3.0 Book – Using Advanced MPI

Our book on “Using Advanced MPI” will appear in about a month — now it’s the time to pre-order on Amazon at a reduced price. It is released by the prestigious MIT Press, a must read for parallel computing experts.

The book contains everything advanced MPI users need to know. It presents all important concepts of MPI 3.0 (including all newly added functions such as nonblocking collectives and the largely extended One Sided functionality). But the key is that the book is written in an example-driven style. All functions are motivated with use-cases and working code is available for most. This follows the successful tradition of the “Using MPI” series lifting it to MPI-3.0 and hopefully makes it an exciting read!

David Bader’s review hits the point

With the ubiquitous use of multiple cores to accelerate applications ranging from science and engineering to Big Data, programming with MPI is essential. Every software developer for high performance applications will find this book useful for programming on modern multicore, cluster, and cloud computers.

Here is a quick overview of the contents:

Section 1: “Introduction” provides a brief overview of the history of MPI and briefly summarizes the basic concepts.

Section 2: “Working with Large Scale Systems” contains examples of how to create highly-scalable systems using nonblocking collective operations, the new distributed graph topology for MPI topology mapping, neighborhood collectives, and advanced communicator creation functions. It equips readers with all information to write codes that are highly-scalable. It even describes how fault-tolerant applications could be written using a high-quality MPI implementation.

Section 3: “Introduction to Remote Memory Operations” is a gentle and light introduction to RMA (One Sided) programming using MPI-3.0. It starts with the concepts of memory exposure (windows) and simple data movement. It presents various example problems followed by practical advice to avoid common pitfalls. It concludes with a discussion on performance.

Section 4: “Advanced Remote Memory Access” will make you a clear expert in RMA programming, it covers advanced concepts such as passive target mode, allocating MPI windows using various examples. It also discusses memory models and scalable synchronization approaches.

Section 5: “Using Shared Memory with MPI” explains MPI’s strategy to shared memory. MPI-3.0 added support for allocating shared memory which essentially enables the new hybrid programming model “MPI+MPI“. This section explains guarantees that MPI provides (and what it does not provide) and several use-cases for shared memory windows.

Section 6: “Hybrid Programming” provides a detailed discussion on how to use MPI in cooperation with other programming models, for example threads or OpenMP. Hybrid programming is emerging to a standard technique and MPI-3.0 introduces several functions to ease the cooperation with others.

Section 7: “Parallel I/O” is most important in the future Big Data world. MPI provides a large set of facilities to support operations on large distributed data sets. We discuss how MPI supports contiguous and noncontiguous accesses as well as the consistency of file operations. Furthermore, we provide hints for improving the performance of MPI I/O.

Section 8: “Coping with Large Data” once Big Data sets are in main memory, we may need to communicate them. MPI-3.0 supports handling large data (>2 GiB) through derived datatypes. We explain how to enable this support and limitations of the current interface.

Section 9: “Support for Performance and Correctness Debugging” is addressed at very advanced programmers as well as tool developers. It describes the MPI tools interface which allows to introspect internals of the MPI library. Its flexible interface supports performance counter and control variables to influence the behavior of MPI. Advanced expert programmers will love this interface for architecture-specific tuning!

Section 10: “Dynamic Process Management” explains how processes can be created and managed. This feature enables growing and shrinking of MPI jobs during their execution and fosters new programming paradigms if it is supported by the batch systems. We only discuss the MPI part in this chapter though.

Section 11: “Working with Modern Fortran” is a must-read for Fortran programmers! How does MPI support type-safe programming and what are the remaining pitfalls and problems in Fortran?

Section 12: “Features for Libraries” addresses advanced library writers and described principles how to develop portable high-quality MPI libraries.