7 dataset results for Automatic Post-Editing AND German

WMT 2016 News (WMT 2016 News Translation Task)

News translation is a recurring WMT task. The test set is a collection of parallel corpora consisting of about 1500 English sentences translated into 5 languages (Czech, German, Finnish, Romanian, Russian, Turkish) and additional 1500 sentences from each of the 5 languages translated to English. For Romanian a third of the test set were released as a development set instead. For Turkish additional 500 sentence development set was released. The sentences were selected from dozens of news websites and translated by professional translators. The training data consists of parallel corpora to train translation models, monolingual corpora to train language models and development sets for tuning. Some training corpora were identical from WMT 2015 (Europarl, United Nations, French-English 10⁹ corpus, Common Crawl, Russian-English parallel data provided by Yandex, Wikipedia Headlines provided by CMU) and some were update (CzEng v1.6pre, News Commentary v11, monolingual news data). Additionally,

23 PAPERS • 8 BENCHMARKS

MLQE-PE

MLQE-PE (Multilingual Quality Estimation and Automatic Post-editing Dataset)

The Multilingual Quality Estimation and Automatic Post-editing (MLQE-PE) Dataset is a dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE). The dataset contains seven language pairs, with human labels for 9,000 translations per language pair in the following formats: sentence-level direct assessments and post-editing effort, and word-level good/bad labels. It also contains the post-edited sentences, as well as titles of the articles where the sentences were extracted from, and the neural MT models used to translate the text.

20 PAPERS • NO BENCHMARKS YET

WMT 2018 News (WMT 2018 News Translation Task)

News translation is a recurring WMT task. The test set is a collection of parallel corpora consisting of about 1500 English sentences translated into 5 languages (Chinese, Czech, Estonian, German, Finnish, Russian, Turkish) and additional 1500 sentences from each of the 7 languages translated to English. The sentences were selected from dozens of news websites and translated by professional translators.

8 PAPERS • NO BENCHMARKS YET

SubEdits

SubEdits is a human-annnoated post-editing dataset of neural machine translation outputs, compiled from in-house NMT outputs and human post-edits of subtitles form Rakuten Viki. It is collected from English-German annotations and contains 160k triplets.

2 PAPERS • NO BENCHMARKS YET

APE (Automatic Post-Editing)

APE is useful to evaluate Machine Translation automatic post-editing (APE), which is the task of improving the output of a blackbox MT system by automatically fixing its mistakes. The act of post-editing text can be fully specified as a sequence of delete and insert actions in given positions.

1 PAPER • NO BENCHMARKS YET

WMT 2015 News (WMT 2015 News Translation Task)

1 PAPER • NO BENCHMARKS YET

WMT 2016 IT (WMT 2016 IT Translation Task)

The IT Translation Task is a shared task introduced in the First Conference on Machine Translation. Compared to WMT 2016 News, this task brought several novelties to WMT:

1 PAPER • NO BENCHMARKS YET

Datasets

7 dataset results for Automatic Post-Editing AND German