WMT 2018 News (WMT 2018 News Translation Task)

Introduced by Bojar et al. in Findings of the 2018 Conference on Machine Translation (WMT18)

News translation is a recurring WMT task. The test set is a collection of parallel corpora consisting of about 1500 English sentences translated into 5 languages (Chinese, Czech, Estonian, German, Finnish, Russian, Turkish) and additional 1500 sentences from each of the 7 languages translated to English. The sentences were selected from dozens of news websites and translated by professional translators.

The training data consists of parallel corpora to train translation models, monolingual corpora to train language models and development sets for tuning. Some training corpora were identical from WMT 2017 (Europarl, Common Crawl, SETIMES2, Russian-English parallel data provided by Yandex, Wikipedia Headlines provided by CMU) and some were update (United Nations, CzEng v1.7, News Commentary v13, monolingual news data). Additionally, the EU Press Release parallel corpus for German, Finnish and Estonian was added.

Source: https://www.statmt.org/wmt18/translation-task.html


Paper Code Results Date Stars

Dataset Loaders


Similar Datasets