Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

17 Sep 2019Mohammad ShoeybiMostofa PatwaryRaul PuriPatrick LeGresleyJared CasperBryan Catanzaro

Recent work in unsupervised language modeling demonstrates that training large neural language models advances the state of the art in Natural Language Processing applications. However, for very large models, memory constraints limit the size of models that can be practically trained... (read more)

PDF Abstract

Evaluation Results from the Paper


 SOTA for Language Modelling on WikiText-103 (using extra training data)

     Get a GitHub badge
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK USES EXTRA
TRAINING DATA
COMPARE
Language Modelling WikiText-103 Megatron-LM Test perplexity 10.8 # 1
Language Modelling WikiText-103 Megatron-LM Number of params 8300M # 1