Publications of Torsten Hoefler
Torsten Hoefler:

 High-performance distributed memory systems – from supercomputers to data centers

(Presentation - presented in virtual, Oct. 2020)
Keynote talk at the 2020 International Symposium on DIStributed Computing (DISC)


We will cover distributed memory programming of high-performance supercomputers and datacenter computers. Starting from the Message Passing Interface, we observe abstractions for distributed computations that we carry through optimizations such as topology mapping and collective communication optimization. We then discuss efficient correction protocols to enable fault tolerance in such high-performance distributed systems. Armed with these insights, we observe that supercomputers are likely to migrate into megadatacenter installations leading to a general convergence of such architectures. The first step, converging the network interfaces, is well underway towards a general acceptance of Remote Direct Memory Access (RDMA) networking. RDMA moves the distributed system closer to shared memory, with a weakly consistent memory model. We discuss several algorithmic and systems approaches to accelerate distributed replicated state machines, databases, and locking systems by orders of magnitude using RDMA. Finally, if time allows, we will outline parametric program graphs – a sound abstraction for analyzing and optimizing applications. Each topic will identify open problems and provide ideas for further work to deepen our understanding of high-performance distributed memory systems.


