Producing Unseen Morphological Variants in Statistical Machine Translation

EACL 2017 · Matthias Huck, Ale{\v{s}} Tamchyna, Ond{\v{r}}ej Bojar, Alex Fraser, er ·

Translating into morphologically rich languages is difficult. Although the coverage of lemmas may be reasonable, many morphological variants cannot be learned from the training data. We present a statistical translation system that is able to produce these inflected word forms. Different from most previous work, we do not separate morphological prediction from lexical choice into two consecutive steps. Our approach is novel in that it is integrated in decoding and takes advantage of context information from both the source language and the target language sides.

PDF Abstract