The University of Helsinki Submissions to the WMT19 Similar Language Translation Task

WS 2019 · Yves Scherrer, Ra{\'u}l V{\'a}zquez, Sami Virpioja ·

This paper describes the University of Helsinki Language Technology group{'}s participation in the WMT 2019 similar language translation task. We trained neural machine translation models for the language pairs Czech {\textless}-{\textgreater} Polish and Spanish {\textless}-{\textgreater} Portuguese. Our experiments focused on different subword segmentation methods, and in particular on the comparison of a cognate-aware segmentation method, Cognate Morfessor, with character segmentation and unsupervised segmentation methods for which the data from different languages were simply concatenated. We did not observe major benefits from cognate-aware segmentation methods, but further research may be needed to explore larger parts of the parameter space. Character-level models proved to be competitive for translation between Spanish and Portuguese, but they are slower in training and decoding.

PDF Abstract