Scaling Neural Machine Translation

EMNLP 2018 Myle OttSergey EdunovDavid GrangierMichael Auli

Sequence to sequence learning models still require several days to reach state of the art performance on large benchmark datasets using a single machine. This paper shows that reduced precision and large batch training can speedup training by nearly 5x on a single 8-GPU machine with careful tuning and implementation... (read more)

Evaluation results from the paper

Task Dataset Model Metric name Metric value Global rank Compare
Machine Translation WMT2014 English-French Transformer Big BLEU score 43.2 # 4
Machine Translation WMT2014 English-German Transformer Big BLEU score 29.3 # 5