Cross-lingual Language Model Pretraining

22 Jan 2019Guillaume Lample • Alexis Conneau

Recent studies have demonstrated the efficiency of generative pretraining for English natural language understanding. On unsupervised machine translation, we obtain 34.3 BLEU on WMT'16 German-English, improving the previous state of the art by more than 9 BLEU. On supervised machine translation, we obtain a new state of the art of 38.5 BLEU on WMT'16 Romanian-English, outperforming the previous best approach by more than 4 BLEU.

Full paper

Evaluation


Task Dataset Model Metric name Metric value Global rank Compare
Unsupervised Machine Translation WMT2014 English-French MLM pretraining for encoder and decoder BLEU 33.4 # 2
Unsupervised Machine Translation WMT2014 French-English MLM pretraining for encoder and decoder BLEU 33.3 # 2
Unsupervised Machine Translation WMT2016 English-German MLM pretraining for encoder and decoder BLEU 26.4 # 2
Unsupervised Machine Translation WMT2016 English-Romanian MLM pretraining for encoder and decoder BLEU 33.3 # 1
Unsupervised Machine Translation WMT2016 German-English MLM pretraining for encoder and decoder BLEU 34.3 # 2
Machine Translation WMT2016 Romanian-English MLM pretraining BLEU score 35.3 # 1
Unsupervised Machine Translation WMT2016 Romanian-English MLM pretraining for encoder and decoder BLEU 31.8 # 1