no code implementations • WMT (EMNLP) 2021 • Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ondřej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-Jussa, Cristina España-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher Homan, Matthias Huck, Kwabena Amponsah-Kaakyire, Jungo Kasai, Daniel Khashabi, Kevin Knight, Tom Kocmi, Philipp Koehn, Nicholas Lourie, Christof Monz, Makoto Morishita, Masaaki Nagata, Ajay Nagesh, Toshiaki Nakazawa, Matteo Negri, Santanu Pal, Allahsera Auguste Tapo, Marco Turchi, Valentin Vydrin, Marcos Zampieri
This paper presents the results of the newstranslation task, the multilingual low-resourcetranslation for Indo-European languages, thetriangular translation task, and the automaticpost-editing task organised as part of the Con-ference on Machine Translation (WMT) 2021. In the news task, participants were asked tobuild machine translation systems for any of10 language pairs, to be evaluated on test setsconsisting mainly of news stories.
no code implementations • IWSLT 2016 • Ondřej Bojar, Ondřej Cífka, Jindřich Helcl, Tom Kocmi, Roman Sudarikov
We present our submissions to the IWSLT 2016 machine translation task, as our first attempt to translate subtitles and one of our early experiments with neural machine translation (NMT).
no code implementations • IWSLT (EMNLP) 2018 • Tom Kocmi, Dušan Variš, Ondřej Bojar
We present our submission to the IWSLT18 Low Resource task focused on the translation from Basque-to-English.
no code implementations • WMT (EMNLP) 2020 • Tom Kocmi
This paper describes CUNI submission to the WMT 2020 News Translation Shared Task for the low-resource scenario Inuktitut–English in both translation directions.
no code implementations • 25 Feb 2022 • Tom Kocmi, Dominik Macháček, Ondřej Bojar
Machine translation is for us a prime example of deep learning applications where human skills and learning capabilities are taken as a benchmark that many try to match and surpass.
3 code implementations • WMT (EMNLP) 2021 • Tom Kocmi, Christian Federmann, Roman Grundkiewicz, Marcin Junczys-Dowmunt, Hitokazu Matsushita, Arul Menezes
Automatic metrics are commonly used as the exclusive tool for declaring the superiority of one machine translation system's quality over another.
no code implementations • EACL (HumEval) 2021 • Roman Grundkiewicz, Marcin Junczys-Dowmunt, Christian Federmann, Tom Kocmi
Recent studies emphasize the need of document context in human evaluation of machine translations, but little research has been done on the impact of user interfaces on annotator productivity and the reliability of assessments.
no code implementations • 17 Feb 2021 • Rudolf Rosa, Tomáš Musil, Ondřej Dušek, Dominik Jurko, Patrícia Schmidtová, David Mareček, Ondřej Bojar, Tom Kocmi, Daniel Hrbek, David Košťák, Martina Kinská, Marie Nováková, Josef Doležal, Klára Vosecká, Tomáš Studeník, Petr Žabka
We present the first version of a system for interactive generation of theatre play scripts.
no code implementations • EMNLP 2020 • Loïc Barrault, Magdalena Biesialska, Ondřej Bojar, Marta R. Costa-jussà, Christian Federmann, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Matthias Huck, Eric Joanis, Tom Kocmi, Philipp Koehn, Chi-kiu Lo, Nikola Ljubešić, Christof Monz, Makoto Morishita, Masaaki Nagata, Toshiaki Nakazawa, Santanu Pal, Matt Post, Marcos Zampieri
In the news task, participants were asked to build machine translation systems for any of 11 language pairs, to be evaluated on test sets consisting mainly of news stories.
no code implementations • WMT (EMNLP) 2020 • Ivana Kvapilíková, Tom Kocmi, Ondřej Bojar
This paper presents a description of CUNI systems submitted to the WMT20 task on unsupervised and very low-resource supervised machine translation between German and Upper Sorbian.
1 code implementation • WMT (EMNLP) 2020 • Tom Kocmi, Tomasz Limisiewicz, Gabriel Stanovsky
Our work presents the largest evidence for the phenomenon in more than 19 systems submitted to the WMT over four diverse target languages: Czech, German, Polish, and Russian.
no code implementations • 6 Jul 2020 • Tom Kocmi, Martin Popel, Ondrej Bojar
We present a new release of the Czech-English parallel corpus CzEng 2. 0 consisting of over 2 billion words (2 "gigawords") in each language.
no code implementations • 25 Jun 2020 • Rudolf Rosa, Ondřej Dušek, Tom Kocmi, David Mareček, Tomáš Musil, Patrícia Schmidtová, Dominik Jurko, Ondřej Bojar, Daniel Hrbek, David Košťák, Martina Kinská, Josef Doležal, Klára Vosecká
We present THEaiTRE, a starting project aimed at automatic generation of theatre play scripts.
no code implementations • 6 Jan 2020 • Tom Kocmi
For the former scenario, we present a proof-of-concept method by reusing a model trained by other researchers.
no code implementations • EAMT 2020 • Tom Kocmi, Ondřej Bojar
To show the applicability of our method, we recycle a Transformer model trained by different researchers and use it to seed models for different language pairs.
no code implementations • WS 2019 • Tom Kocmi, Ond{\v{r}}ej Bojar
This paper describes the CUNI submission to the WMT 2019 News Translation Shared Task for the low-resource languages: Gujarati-English and Kazakh-English.
no code implementations • WS 2018 • Tom Kocmi, Roman Sudarikov, Ond{\v{r}}ej Bojar
Our main focus was the low-resource language pair of Estonian and English for which we utilized Finnish parallel data in a simple method.
no code implementations • WS 2018 • Tom Kocmi, Ondřej Bojar
We present a simple transfer learning method, where we first train a "parent" model for a high-resource language pair and then continue the training on a lowresource pair only by replacing the training corpus.
Low-Resource Neural Machine Translation
Transfer Learning
+1
1 code implementation • 18 Jun 2018 • Tom Kocmi, Ondřej Bojar
Skip-gram (word2vec) is a recent method for creating vector representations of words ("distributed word representations") using a neural network.
no code implementations • WS 2017 • Tom Kocmi, Ondřej Bojar
We support this hypothesis by observing the performance in learning lexical relations and by the fact that the network can learn to perform reasonably in its task even with fixed random embeddings.
no code implementations • WS 2017 • Tom Kocmi, Du{\v{s}}an Vari{\v{s}}, Ond{\v{r}}ej Bojar
The paper presents this year{'}s CUNI submissions to the WAT 2017 Translation Task focusing on the Japanese-English translation, namely Scientific papers subtask, Patents subtask and Newswire subtask.
no code implementations • RANLP 2017 • Tom Kocmi, Ondrej Bojar
We examine the effects of particular orderings of sentence pairs on the on-line training of neural machine translation (NMT).
1 code implementation • EACL 2017 • Tom Kocmi, Ondřej Bojar
In language identification, a common first step in natural language processing, we want to automatically determine the language of some input text.