Search Results for author: Dmitri Vainbrand

Found 1 papers, 1 papers with code

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

1 code implementation • 9 Apr 2021 • Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick Legresley, Mostofa Patwary, Vijay Anand Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia

In this paper, we show how different types of parallelism methods (tensor, pipeline, and data parallelism) can be composed to scale to thousands of GPUs and models with trillions of parameters.

Language Modelling

8,472

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.