Discamus continentiam augere, luxuriam coercere
Home -> Publications
Home
  Publications
    
edited volumes
  Awards
  Research
  Teaching
  Miscellaneous
  Full CV [pdf]
  BLOG






  Events








  Past Events





Publications of Torsten Hoefler
Shigang Li, Torsten Hoefler:

 Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines

(In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC21), presented in St. Louis, Missouri, ACM, Nov. 2021)
Best Paper Finalist

Abstract

Training large deep learning models at scale is very challenging. This paper proposes Chimera, a novel pipeline parallelism scheme which combines bidirectional pipelines for efficiently training large-scale models. Chimera is a synchronous approach and therefore no loss of accuracy, which is more convergence-friendly than asynchronous approaches. Compared with the latest synchronous pipeline approach, Chimera reduces the number of bubbles by up to 50%; benefiting from the sophisticated scheduling of bidirectional pipelines, Chimera has a more balanced activation memory consumption. Evaluations are conducted on Transformer based language models. For a GPT-2 model with 1.3 billion parameters running on 2,048 GPU nodes of the Piz Daint supercomputer, Chimera improves the training throughput by 1.16x-2.34x over the state-of-the-art synchronous and asynchronous pipeline approaches.

Documents

download article:     


Recorded talk (best effort)

 

BibTeX

@inproceedings{nopfs,
  author={Shigang Li and Torsten Hoefler},
  title={{Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines}},
  year={2021},
  month={Nov.},
  booktitle={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC21)},
  location={St. Louis, Missouri},
  publisher={ACM},
  source={http://www.unixer.de/~htor/publications/},
}


serving: 54.172.135.8:50062© Torsten Hoefler