SubCo: A Learner Translation Corpus of Human and Machine Subtitles
In this paper, we present a freely available corpus of human and automatic translations of subtitles. The corpus comprises, the original English subtitles (SRC), both human (HT) and machine translations (MT) into German, as well as post-editions (PE) of the MT output. HT and MT are annotated with errors. Moreover, human evaluation is included in HT, MT, and PE. Such a corpus is a valuable resource for both human and machine translation communities, enabling the direct comparison {--} in terms of errors and evaluation {--} between human and machine translations and post-edited machine translations.
PDF Abstract LREC 2016 PDF LREC 2016 Abstract