Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem. Previous work addresses the translation of out-of-vocabulary words by backing off to a dictionary. In this paper, we introduce a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units.
|Task||Dataset||Model||Metric name||Metric value||Global rank||Compare|
|Machine Translation||WMT2015 English-German||BPE word segmentation||BLEU score||22.8||# 4|
|Machine Translation||WMT2015 English-Russian||C2-50k Segmentation||BLEU score||20.9||# 1|