Discamus continentiam augere, luxuriam coercere
Home -> Publications
edited volumes
  Full CV [pdf]


  Past Events

Publications of Torsten Hoefler
Shigang Li, Tal Ben-Nun, Giorgi Nadiradze, Salvatore Di Girolamo, Nikoli Dryden, Dan Alistarh, Torsten Hoefler:

 Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging

(IEEE Transactions on Parallel and Distributed Systems. Vol 32, Nr. 7, pages 1725-1739, IEEE, 2021)

Publisher Reference


Deep learning at scale is dominated by communication time. Distributing samples across nodes usually yields the best performance, but poses scaling challenges due to global information dissemination and load imbalance across uneven sample lengths. State-of-the-art decentralized optimizers mitigate the problem, but require more iterations to achieve the same accuracy as their globally-communicating counterparts. We present Wait-Avoiding Group Model Averaging (WAGMA) SGD, a wait-avoiding stochastic optimizer that reduces global communication via subgroup weight exchange. The key insight is a combination of algorithmic changes to the averaging scheme and the use of a group allreduce operation. We prove the convergence of WAGMA-SGD, and empirically show that it retains convergence rates equivalent to Allreduce-SGD. For evaluation, we train ResNet-50 on ImageNet; Transformer for machine translation; and deep reinforcement learning for navigation at scale. Compared with state-of-the-art decentralized SGD variants, WAGMA-SGD significantly improves training throughput (e.g., 2.1x on 1,024 GPUs for reinforcement learning), and achieves the fastest time-to-solution (e.g., the highest score using the shortest training time for Transformer).


Publisher URL: https://ieeexplore.ieee.org/document/9271898download article:


  author={Shigang Li and Tal Ben-Nun and Giorgi Nadiradze and Salvatore Di Girolamo and Nikoli Dryden and Dan Alistarh and Torsten Hoefler},
  title={{Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging}},
  journal={IEEE Transactions on Parallel and Distributed Systems},

serving:© Torsten Hoefler