Training Large Neural Networks with Constant Memory using a New Execution Algorithm

13 Feb 2020Bharadwaj PudipeddiMaral MesmakhosroshahiJinwen XiSujeeth Bharadwaj

Widely popular transformer-based NLP models such as BERT and GPT have enormous capacity trending to billions of parameters. Current execution methods demand brute-force resources such as HBM devices and high speed interconnectivity for data parallelism... (read more)

PDF Abstract

Evaluation Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.