1 code implementation • EACL 2021 • Ebrahim Ansari, Ond{\v{r}}ej Bojar, Barry Haddow, Mohammad Mahmoudi
SLTev reports the quality, latency, and stability of an SLT candidate output based on the time-stamped transcript and reference translation into a target language.
no code implementations • EACL 2021 • Ond{\v{r}}ej Bojar, Dominik Mach{\'a}{\v{c}}ek, Sangeet Sagar, Otakar Smr{\v{z}}, Jon{\'a}{\v{s}} Kratochv{\'\i}l, Peter Pol{\'a}k, Ebrahim Ansari, Mohammad Mahmoudi, Rishu Kumar, Dario Franceschini, Chiara Canton, Ivan Simonini, Thai-Son Nguyen, Felix Schneider, Sebastian St{\"u}ker, Alex Waibel, Barry Haddow, Rico Sennrich, Philip Williams
This paper presents an automatic speech translation system aimed at live subtitling of conference presentations.
no code implementations • WS 2020 • Ebrahim Ansari, Amittai Axelrod, Nguyen Bach, Ond{\v{r}}ej Bojar, Roldano Cattoni, Fahim Dalvi, Nadir Durrani, Marcello Federico, Christian Federmann, Jiatao Gu, Fei Huang, Kevin Knight, Xutai Ma, Ajay Nagesh, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Xing Shi, Sebastian St{\"u}ker, Marco Turchi, Alex Waibel, er, Changhan Wang
The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2020) featured this year six challenge tracks: (i) Simultaneous speech translation, (ii) Video speech translation, (iii) Offline speech translation, (iv) Conversational speech translation, (v) Open domain translation, and (vi) Non-native speech translation.
no code implementations • WS 2020 • Peter Pol{\'a}k, Sangeet Sagar, Dominik Mach{\'a}{\v{c}}ek, Ond{\v{r}}ej Bojar
We complement this ASR with off-the-shelf MT systems to take part also in the speech translation track.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • LREC 2020 • Erion {\c{C}}ano, Ond{\v{r}}ej Bojar
Recent developments in sequence-to-sequence learning with neural networks have considerably improved the quality of automatically generated text summaries and document keywords, stipulating the need for even bigger training corpora.
no code implementations • LREC 2020 • Vil{\'e}m Zouhar, Ond{\v{r}}ej Bojar
It is not uncommon for Internet users to have to produce a text in a foreign language they have very little knowledge of and are unable to verify the translation quality.
no code implementations • LREC 2020 • Dario Franceschini, Chiara Canton, Ivan Simonini, Armin Schweinfurth, Adelheid Glott, Sebastian St{\"u}ker, Thai-Son Nguyen, Felix Schneider, Thanh-Le Ha, Alex Waibel, Barry Haddow, Philip Williams, Rico Sennrich, Ond{\v{r}}ej Bojar, Sangeet Sagar, Dominik Mach{\'a}{\v{c}}ek, Otakar Smr{\v{z}}
This paper presents our progress towards deploying a versatile communication platform in the task of highly multilingual live speech translation for conferences and remote meetings live subtitling.
no code implementations • LREC 2020 • Shantipriya Parida, Satya Ranjan Dash, Ond{\v{r}}ej Bojar, Petr Motlicek, Priyanka Pattnaik, Debasish Kumar Mallick
The preparation of parallel corpora is a challenging task, particularly for languages that suffer from under-representation in the digital world.
no code implementations • WS 2019 • Shantipriya Parida, Ond{\v{r}}ej Bojar, Petr Motlicek
This paper describes the Idiap submission to WAT 2019 for the English-Hindi Multi-Modal Translation Task.
no code implementations • WS 2019 • Toshiaki Nakazawa, Nobushige Doi, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Yusuke Oda, Shantipriya Parida, Ond{\v{r}}ej Bojar, Sadao Kurohashi
This paper presents the results of the shared tasks from the 6th workshop on Asian translation (WAT2019) including Ja↔En, Ja↔Zh scientific paper translation subtasks, Ja↔En, Ja↔Ko, Ja↔En patent translation subtasks, Hi↔En, My↔En, Km↔En, Ta↔En mixed domain subtasks and Ru↔Ja news commentary translation task.
no code implementations • WS 2019 • Erion {\c{C}}ano, Ond{\v{r}}ej Bojar
Using data-driven models for solving text summarization or similar tasks has become very common in the last years.
no code implementations • WS 2019 • Kate{\v{r}}ina Rysov{\'a}, Magdal{\'e}na Rysov{\'a}, Tom{\'a}{\v{s}} Musil, Lucie Pol{\'a}kov{\'a}, Ond{\v{r}}ej Bojar
As the quality of machine translation rises and neural machine translation (NMT) is moving from sentence to document level translations, it is becoming increasingly difficult to evaluate the output of translation systems.
no code implementations • WS 2019 • Qingsong Ma, Johnny Wei, Ond{\v{r}}ej Bojar, Yvette Graham
This paper presents the results of the WMT19 Metrics Shared Task.
no code implementations • WS 2019 • Ivana Kvapil{\'\i}kov{\'a}, Dominik Mach{\'a}{\v{c}}ek, Ond{\v{r}}ej Bojar
In this paper we describe the CUNI translation system used for the unsupervised news shared task of the ACL 2019 Fourth Conference on Machine Translation (WMT19).
no code implementations • WS 2019 • Lo{\"\i}c Barrault, Ond{\v{r}}ej Bojar, Marta R. Costa-juss{\`a}, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias M{\"u}ller, Santanu Pal, Matt Post, Marcos Zampieri
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019.
1 code implementation • WS 2019 • Tereza Vojt{\v{e}}chov{\'a}, Michal Nov{\'a}k, Milo{\v{s}} Klou{\v{c}}ek, Ond{\v{r}}ej Bojar
This paper describes a machine translation test set of documents from the auditing domain and its use as one of the {``}test suites{''} in the WMT19 News Translation Task for translation directions involving Czech, English and German.
no code implementations • WS 2019 • Tom Kocmi, Ond{\v{r}}ej Bojar
This paper describes the CUNI submission to the WMT 2019 News Translation Shared Task for the low-resource languages: Gujarati-English and Kazakh-English.
no code implementations • ACL 2019 • Du{\v{s}}an Vari{\v{s}}, Ond{\v{r}}ej Bojar
In our method, we initialize the weights of the encoder and decoder with two language models that are trained with monolingual data and then fine-tune the model on parallel data using Elastic Weight Consolidation (EWC) to avoid forgetting of the original language modeling task.
no code implementations • NAACL 2019 • Erion {\c{C}}ano, Ond{\v{r}}ej Bojar
Most of the proposed supervised and unsupervised methods for keyphrase generation are unable to produce terms that are valuable but do not appear in the text.
no code implementations • WS 2018 • Silvie Cinkov{\'a}, Ond{\v{r}}ej Bojar
We present a pilot study of machine translation of selected grammatical contrasts between Czech and English in WMT18 News Translation Task.
no code implementations • WS 2018 • Ond{\v{r}}ej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Philipp Koehn, Christof Monz
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2018.
no code implementations • EMNLP 2018 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
no code implementations • WS 2018 • Ond{\v{r}}ej Bojar, Ji{\v{r}}{\'\i} M{\'\i}rovsk{\'y}, Kate{\v{r}}ina Rysov{\'a}, Magdal{\'e}na Rysov{\'a}
We present the results of automatic evaluation of discourse in machine translation (MT) outputs using the EVALD tool.
no code implementations • WS 2018 • Qingsong Ma, Ond{\v{r}}ej Bojar, Yvette Graham
We asked participants of this task to score the outputs of the MT systems involved in the WMT18 News Translation Task with automatic metrics.
no code implementations • WS 2018 • Tom Kocmi, Roman Sudarikov, Ond{\v{r}}ej Bojar
Our main focus was the low-resource language pair of Estonian and English for which we utilized Finnish parallel data in a simple method.
no code implementations • WS 2018 • Franck Burlot, Yves Scherrer, Vinit Ravishankar, Ond{\v{r}}ej Bojar, Stig-Arne Gr{\"o}nroos, Maarit Koponen, Tommi Nieminen, Fran{\c{c}}ois Yvon
Progress in the quality of machine translation output calls for new automatic evaluation procedures and metrics.
no code implementations • ACL 2018 • Ond{\v{r}}ej C{\'\i}fka, Ond{\v{r}}ej Bojar
One of possible ways of obtaining continuous-space sentence representations is by training neural machine translation (NMT) systems.
no code implementations • WS 2017 • Tom Kocmi, Du{\v{s}}an Vari{\v{s}}, Ond{\v{r}}ej Bojar
The paper presents this year{'}s CUNI submissions to the WAT 2017 Translation Task focusing on the Japanese-English translation, namely Scientific papers subtask, Patents subtask and Newswire subtask.
no code implementations • WS 2017 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Shu-Jian Huang, Matthias Huck, Philipp Koehn, Qun Liu, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Raphael Rubino, Lucia Specia, Marco Turchi
no code implementations • WS 2017 • Jan-Thorsten Peter, Hermann Ney, Ond{\v{r}}ej Bojar, Ngoc-Quan Pham, Jan Niehues, Alex Waibel, Franck Burlot, Fran{\c{c}}ois Yvon, M{\=a}rcis Pinnis, Valters {\v{S}}ics, Jasmijn Bastings, Miguel Rios, Wilker Aziz, Philip Williams, Fr{\'e}d{\'e}ric Blain, Lucia Specia
no code implementations • WS 2017 • Antonio Jimeno Yepes, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Karin Verspoor, Ond{\v{r}}ej Bojar, Arthur Boyer, Cristian Grozea, Barry Haddow, Madeleine Kittner, Yvonne Lichtblau, Pavel Pecina, Rol Roller, , Rudolf Rosa, Amy Siu, Philippe Thomas, Saskia Trescher
no code implementations • EACL 2017 • Matthias Huck, Ale{\v{s}} Tamchyna, Ond{\v{r}}ej Bojar, Alex Fraser, er
Translating into morphologically rich languages is difficult.
no code implementations • WS 2016 • Bushra Jawaid, Amir Kamran, Ond{\v{r}}ej Bojar
This paper focuses on the generation of case markers for free word order languages that use case markers as phrasal clitics for marking the relationship between the dependent-noun and its head.
no code implementations • WS 2016 • Roman Sudarikov, Ond{\v{r}}ej Du{\v{s}}ek, Martin Holub, Ond{\v{r}}ej Bojar, Vincent Kr{\'\i}{\v{z}}
We describe experiments in Machine Translation using word sense disambiguation (WSD) information.
no code implementations • WS 2016 • Jan-Thorsten Peter, Tamer Alkhouli, Hermann Ney, Matthias Huck, Fabienne Braune, Alex Fraser, er, Ale{\v{s}} Tamchyna, Ond{\v{r}}ej Bojar, Barry Haddow, Rico Sennrich, Fr{\'e}d{\'e}ric Blain, Lucia Specia, Jan Niehues, Alex Waibel, Alex Allauzen, re, Lauriane Aufrant, Franck Burlot, Elena Knyazeva, Thomas Lavergne, Fran{\c{c}}ois Yvon, M{\=a}rcis Pinnis, Stella Frank
Ranked #12 on Machine Translation on WMT2016 English-Romanian
no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor, Marcos Zampieri
no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Christian Buck, Rajen Chatterjee, Christian Federmann, Liane Guillou, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Pavel Pecina, Martin Popel, Philipp Koehn, Christof Monz, Matteo Negri, Matt Post, Lucia Specia, Karin Verspoor, J{\"o}rg Tiedemann, Marco Turchi
no code implementations • WS 2015 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Barry Haddow, Matthias Huck, Chris Hokamp, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Carolina Scarton, Lucia Specia, Marco Turchi
no code implementations • LREC 2014 • Bushra Jawaid, Ond{\v{r}}ej Bojar
The idea of two-step machine translation was introduced to divide the complexity of the search space into two independent steps: (1) lexical translation and reordering, and (2) conjugation and declination in the target language.
no code implementations • LREC 2014 • Bushra Jawaid, Amir Kamran, Ond{\v{r}}ej Bojar
In this paper, we describe a release of a sizeable monolingual Urdu corpus automatically tagged with part-of-speech tags.
no code implementations • LREC 2014 • Ond{\v{r}}ej Bojar, Vojt{\v{e}}ch Diatka, Pavel Rychl{\'y}, Pavel Stra{\v{n}}{\'a}k, V{\'\i}t Suchomel, Ale{\v{s}} Tamchyna, Daniel Zeman
HindEnCorp consists of 274k parallel sentences (3. 9 million Hindi and 3. 8 million English tokens).
no code implementations • LREC 2014 • Nianwen Xue, Ond{\v{r}}ej Bojar, Jan Haji{\v{c}}, Martha Palmer, Zde{\v{n}}ka Ure{\v{s}}ov{\'a}, Xiuhong Zhang
Abstract Meaning Representations (AMRs) are rooted, directional and labeled graphs that abstract away from morpho-syntactic idiosyncrasies such as word category (verbs and nouns), word order, and function words (determiners, some prepositions).
no code implementations • LREC 2012 • Ond{\v{r}}ej Bojar, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}, Ond{\v{r}}ej Du{\v{s}}ek, Petra Galu{\v{s}}{\v{c}}{\'a}kov{\'a}, Martin Majli{\v{s}}, David Mare{\v{c}}ek, Ji{\v{r}}{\'\i} Mar{\v{s}}{\'\i}k, Michal Nov{\'a}k, Martin Popel, Ale{\v{s}} Tamchyna
CzEng 1. 0 is automatically aligned at the level of sentences as well as words.
no code implementations • LREC 2012 • Jan Berka, Ond{\v{r}}ej Bojar, Mark Fishel, Maja Popovi{\'c}, Daniel Zeman
We present a complex, open source tool for detailed machine translation error analysis providing the user with automatic error detection and classification, several monolingual alignment algorithms as well as with training and test corpus browsing.
no code implementations • LREC 2012 • Mark Fishel, Ond{\v{r}}ej Bojar, Maja Popovi{\'c}
Recently the first methods of automatic diagnostics of machine translation have emerged; since this area of research is relatively young, the efforts are not coordinated.
no code implementations • LREC 2012 • Jan Haji{\v{c}}, Eva Haji{\v{c}}ov{\'a}, Jarmila Panevov{\'a}, Petr Sgall, Ond{\v{r}}ej Bojar, Silvie Cinkov{\'a}, Eva Fu{\v{c}}{\'\i}kov{\'a}, Marie Mikulov{\'a}, Petr Pajas, Jan Popelka, Ji{\v{r}}{\'\i} Semeck{\'y}, Jana {\v{S}}indlerov{\'a}, Jan {\v{S}}t{\v{e}}p{\'a}nek, Josef Toman, Zde{\v{n}}ka Ure{\v{s}}ov{\'a}, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}
We introduce a substantial update of the Prague Czech-English Dependency Treebank, a parallel corpus manually annotated at the deep syntactic layer of linguistic representation.