Unsupervised Statistical Machine Translation

EMNLP 2018 Mikel Artetxe • Gorka Labaka • Eneko Agirre

While modern machine translation has relied on large parallel corpora, a recent line of work has managed to train Neural Machine Translation (NMT) systems from monolingual corpora only (Artetxe et al., 2018c; Lample et al., 2018). Despite the potential of this approach for low-resource settings, existing systems are far behind their supervised counterparts, limiting their practical interest. In this paper, we propose an alternative approach based on phrase-based Statistical Machine Translation (SMT) that significantly closes the gap with supervised systems.

Full paper

Evaluation


Task Dataset Model Metric name Metric value Global rank Compare
Machine Translation WMT2014 English-French SMT + iterative backtranslation (unsupervised) BLEU score 26.22 # 28
Machine Translation WMT2014 English-German SMT + iterative backtranslation (unsupervised) BLEU score 14.08 # 30
Machine Translation WMT2014 French-English SMT + iterative backtranslation (unsupervised) BLEU score 25.87 # 1
Unsupervised Machine Translation WMT2014 French-English SMT BLEU 25.9 # 5
Machine Translation WMT2014 German-English SMT + iterative backtranslation (unsupervised) BLEU score 17.43 # 3
Machine Translation WMT2016 English-German SMT + iterative backtranslation (unsupervised) BLEU score 18.23 # 3
Machine Translation WMT2016 German-English SMT + iterative backtranslation (unsupervised) BLEU score 23.05 # 3