For an annotation experiment of two phases, we chose Czech and English documents translated by systems submitted to WMT20 News Translation Task.
We present the Eyetracked Multi-Modal Translation (EMMT) corpus, a dataset containing monocular eye movement recordings, audio and 4-electrode electroencephalogram (EEG) data of 43 participants.
Finally, we show that it is possible to combine PCA with using 1bit per dimension.
Translating text into a language unknown to the text's author, dubbed outbound translation, is a modern need for which the user experience has significant room for improvement, beyond the basic machine translation facility.
In most of neural machine translation distillation or stealing scenarios, the goal is to preserve the performance of the target model (teacher).
The most common tools for word-alignment rely on a large amount of parallel sentences, which are then usually processed according to one of the IBM model algorithms.
It is not uncommon for Internet users to have to produce a text in a foreign language they have very little knowledge of and are unable to verify the translation quality.