WMT 2015 News (WMT 2015 News Translation Task)

Introduced by Bojar et al. in Findings of the 2015 Workshop on Statistical Machine Translation

News translation is a recurring WMT task. The test set is a collection of parallel corpora consisting of about 1500 English sentences translated into 5 languages (Czech, German, Finnish, French, Russian) and additional 1500 sentences from each of the 5 languages translated to English. The sentences are taken from newspaper articles for each language pair, except for French, where the test set was drawn from user-generated comments on the news articles (from Guardian and Le Monde). The translation was done by professional translators.

The training data consists of parallel corpora to train translation models, monolingual corpora to train language models and development sets for tuning. Some training corpora were identical from WMT 2014 (Europarl, United Nations, French-English 10⁹ corpus, CzEng, Common Crawl, Russian-English parallel data provided by Yandex, Wikipedia Headlines provided by CMU) and some were update (News Commentary, monolingual news data). Additionally, the Finnish Europarl and Finnish-English Wikipedia Headline corpus were added.

