Search Results for author: Patrick Legresley

Found 6 papers, 4 papers with code

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

1 code implementation9 Apr 2021 Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick Legresley, Mostofa Patwary, Vijay Anand Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia

In this paper, we show how different types of parallelism methods (tensor, pipeline, and data parallelism) can be composed to scale to thousands of GPUs and models with trillions of parameters.

Language Modelling

Neural ODEs for Image Segmentation with Level Sets

no code implementations25 Dec 2019 Rafael Valle, Fitsum Reda, Mohammad Shoeybi, Patrick Legresley, Andrew Tao, Bryan Catanzaro

We propose a novel approach for image segmentation that combines Neural Ordinary Differential Equations (NODEs) and the Level Set method.

Image Segmentation object-detection +4

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

10 code implementations17 Sep 2019 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick Legresley, Jared Casper, Bryan Catanzaro

To demonstrate that large language models can further advance the state of the art (SOTA), we train an 8. 3 billion parameter transformer language model similar to GPT-2 and a 3. 9 billion parameter model similar to BERT.

LAMBADA Language Modelling +1

Cannot find the paper you are looking for? You can Submit a new open access paper.