ZeRO: Memory Optimization Towards Training A Trillion Parameter Models

4 Oct 2019Samyam RajbhandariJeff RasleyOlatunji RuwaseYuxiong He

Training large DL models with billions and potentially trillions of parameters is challenging. Existing solutions exhibit fundamental limitations to obtain both memory and scaling (computation/communication) efficiency together... (read more)

PDF Abstract

Evaluation Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.