|Home -> Teaching -> CS498
CS498: Hot Topics in High-Performance Computing: Networking and Fault Tolerance
This is the class webpage for the course CS498 Hot Topics in High-Performance Computing: Networking and Fault Tolerance. The class is taught by Prof. Franck Cappello (Fault Tolerance) and Prof. Torsten Hoefler (Networking).
Topics: Hot Topics in High Performance Parallel Computing: Networks and Fault Tolerance. Large-scale computer systems such as Petascale or upcoming Exascale machines pose significant challenges on the system and software designers. In this course, we will address to very important topics in this design: HPC networking and Fault Tolerance. The network will soon be the most expensive and critical part of large machines and fault tolerance is needed to ensure correct operation under the increasing probability of failures of single elements. This course requires basic knowledge in graph theory and system architecture. This section is for undergraduate or graduate students offering 3 or 4 credits respectively.
Class Wiki:The full slides and all administrative details and additional class materials are posted in the Class Wiki.
The source books for the slides of each lecture are listed in the wiki!
Networking (Prof. Hoefler)
The lecture is divided into multiple sections that do not correspond to the class numbers (some sections span multiple classes). The slides are merely for reference, all analytic models, equations, and examples will be discussed at the whiteboard to allow the students to follow the constructions in detail and improve the interactivity of the class. Nevertheless, all students are encouraged to take notes.
Fault Tolerance (Prof. Cappello)
|© Torsten Hoefler