no code implementations • EMNLP (IWSLT) 2019 • Martin Popel, Christian Federmann
We describe our four NMT systems submitted to the IWSLT19 shared task in English→Czech text-to-text translation of TED talks.
no code implementations • EACL (HumEval) 2021 • Věra Kloudová, Ondřej Bojar, Martin Popel
This paper provides a quick overview of possible methods how to detect that reference translations were actually created by post-editing an MT system.
no code implementations • Findings (EMNLP) 2021 • Martin Popel, Zdeněk Žabokrtský, Anna Nedoluzhko, Michal Novák, Daniel Zeman
One can find dozens of data resources for various languages in which coreference - a relation between two or more expressions that refer to the same real-world entity - is manually annotated.
no code implementations • WMT (EMNLP) 2020 • Martin Popel
We describe our two NMT systems submitted to the WMT 2020 shared task in English<->Czech and English<->Polish news translation.
no code implementations • WMT (EMNLP) 2020 • Ulrich Germann, Roman Grundkiewicz, Martin Popel, Radina Dobreva, Nikolay Bogoychev, Kenneth Heafield
We describe the joint submission of the University of Edinburgh and Charles University, Prague, to the Czech/English track in the WMT 2020 Shared Task on News Translation.
no code implementations • WMT (EMNLP) 2021 • Petr Gebauer, Ondřej Bojar, Vojtěch Švandelík, Martin Popel
We use the latter for experiments with various backtranslation techniques.
no code implementations • LREC 2022 • Anna Nedoluzhko, Michal Novák, Martin Popel, Zdeněk Žabokrtský, Amir Zeldes, Daniel Zeman
Recent advances in standardization for annotated language resources have led to successful large scale efforts, such as the Universal Dependencies (UD) project for multilingual syntactically annotated data.
1 code implementation • 21 Oct 2024 • Michal Novák, Barbora Dohnalová, Miloslav Konopík, Anna Nedoluzhko, Martin Popel, Ondřej Pražák, Jakub Sido, Milan Straka, Zdeněk Žabokrtský, Daniel Zeman
The paper presents an overview of the third edition of the shared task on multilingual coreference resolution, held as part of the CRAC 2024 workshop.
1 code implementation • 29 Jul 2024 • Tom Kocmi, Eleftherios Avramidis, Rachel Bawden, Ondrej Bojar, Anton Dvorkovich, Christian Federmann, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz, Barry Haddow, Marzena Karpinska, Philipp Koehn, Benjamin Marie, Kenton Murray, Masaaki Nagata, Martin Popel, Maja Popovic, Mariya Shmatova, Steinþór Steingrímsson, Vilém Zouhar
This is the preliminary ranking of WMT24 General MT systems based on automatic metrics.
no code implementations • 10 Apr 2024 • Martin Popel, Lucie Poláková, Michal Novák, Jindřich Helcl, Jindřich Libovický, Pavel Straňák, Tomáš Krabač, Jaroslava Hlaváčová, Mariia Anisimova, Tereza Chlaňová
We present Charles Translator, a machine translation system between Ukrainian and Czech, developed as part of a society-wide effort to mitigate the impact of the Russian-Ukrainian war on individuals and society.
1 code implementation • 28 Nov 2023 • Vilém Zouhar, Věra Kloudová, Martin Popel, Ondřej Bojar
The overall translation quality reached by current machine translation (MT) systems for high-resourced language pairs is remarkably good.
no code implementations • 6 Mar 2023 • Petra Galuščáková Romain Deveaud, Gabriela Gonzalez-Saez, Philippe Mulhem, Lorraine Goeuriot, Florina Piroi, Martin Popel
LongEval-Retrieval is a Web document retrieval benchmark that focuses on continuous retrieval evaluation.
no code implementations • 1 Dec 2022 • Martin Popel, Jindřich Libovický, Jindřich Helcl
We present Charles University submissions to the WMT22 General Translation Shared Task on Czech-Ukrainian and Ukrainian-Czech machine translation.
no code implementations • 29 Nov 2022 • Josef Jon, Martin Popel, Ondřej Bojar
We evaluate performance of MBR decoding compared to traditional mixed backtranslation training and we show a possible synergy when using both of the techniques simultaneously.
1 code implementation • CRAC (ACL) 2022 • Zdeněk Žabokrtský, Miloslav Konopík, Anna Nedoluzhko, Michal Novák, Maciej Ogrodniczuk, Martin Popel, Ondřej Pražák, Jakub Sido, Daniel Zeman, YIlun Zhu
The public edition of CorefUD 1. 0, which contains 13 datasets for 10 languages, was used as the source of training and evaluation data.
1 code implementation • WNUT (ACL) 2021 • Jakub Náplava, Martin Popel, Milan Straka, Jana Straková
We also compare two approaches to address the performance drop: a) training the NLP models with noised data generated by our framework; and b) reducing the input noise with external system for natural language correction.
1 code implementation • EMNLP 2021 • Vilém Zouhar, Aleš Tamchyna, Martin Popel, Ondřej Bojar
We test the natural expectation that using MT in professional translation saves human processing time.
no code implementations • 6 Jul 2020 • Tom Kocmi, Martin Popel, Ondrej Bojar
We present a new release of the Czech-English parallel corpus CzEng 2. 0 consisting of over 2 billion words (2 "gigawords") in each language.
no code implementations • WS 2019 • Martin Popel, Dominik Macháček, Michal Auersperger, Ondřej Bojar, Pavel Pecina
We describe our NMT systems submitted to the WMT19 shared task in English-Czech news translation.
no code implementations • WS 2019 • Jindřich Helcl, Jindřich Libovický, Martin Popel
We present our submission to the WMT19 Robustness Task.
no code implementations • CONLL 2018 • Daniel Zeman, Jan Haji{\v{c}}, Martin Popel, Martin Potthast, Milan Straka, Filip Ginter, Joakim Nivre, Slav Petrov
Every year, the Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets.
no code implementations • WS 2018 • Martin Popel
We apply a simple but effective filtering of the synthetic data.
4 code implementations • 1 Apr 2018 • Martin Popel, Ondřej Bojar
This article describes our experiments in neural machine translation using the recent Tensor2Tensor framework and the Transformer sequence-to-sequence model (Vaswani et al., 2017).
no code implementations • CONLL 2017 • Daniel Zeman, Martin Popel, Milan Straka, Jan Haji{\v{c}}, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Martin Potthast, Francis Tyers, Elena Badmaeva, Memduh Gokirmak, Anna Nedoluzhko, Silvie Cinkov{\'a}, Jan Haji{\v{c}} jr., Jaroslava Hlav{\'a}{\v{c}}ov{\'a}, V{\'a}clava Kettnerov{\'a}, Zde{\v{n}}ka Ure{\v{s}}ov{\'a}, Jenna Kanerva, Stina Ojala, Anna Missil{\"a}, Christopher D. Manning, Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung, Marie-Catherine de Marneffe, Manuela Sanguinetti, Maria Simi, Hiroshi Kanayama, Valeria de Paiva, Kira Droganova, H{\'e}ctor Mart{\'\i}nez Alonso, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Umut Sulubacak, Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Georg Rehm, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, M, Michael l, Jesse Kirchner, Hector Fern Alcalde, ez, Jana Strnadov{\'a}, Esha Banerjee, Ruli Manurung, Antonio Stella, Atsuko Shimada, Sookyoung Kwak, Gustavo Mendon{\c{c}}a, L, Tatiana o, Rattima Nitisaroj, Josie Li
The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets.
no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor, Marcos Zampieri
no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Christian Buck, Rajen Chatterjee, Christian Federmann, Liane Guillou, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Pavel Pecina, Martin Popel, Philipp Koehn, Christof Monz, Matteo Negri, Matt Post, Lucia Specia, Karin Verspoor, J{\"o}rg Tiedemann, Marco Turchi
no code implementations • WS 2016 • Rosa Gaudio, Gorka Labaka, Eneko Agirre, Petya Osenova, Kiril Simov, Martin Popel, Dieke Oele, Gertjan van Noord, Lu{\'\i}s Gomes, Jo{\~a}o Ant{\'o}nio Rodrigues, Steven Neale, Jo{\~a}o Silva, Andreia Querido, Nuno Rendeiro, Ant{\'o}nio Branco
no code implementations • LREC 2016 • Nora Aranberri, Eleftherios Avramidis, Aljoscha Burchardt, Ond{\v{r}}ej Klejch, Martin Popel, Maja Popovi{\'c}
This work addresses the need to aid Machine Translation (MT) development cycles with a complete workflow of MT evaluation methods.
no code implementations • LREC 2016 • Arantxa Otegi, Nora Aranberri, Antonio Branco, Jan Haji{\v{c}}, Martin Popel, Kiril Simov, Eneko Agirre, Petya Osenova, Rita Pereira, Jo{\~a}o Silva, Steven Neale
This work presents parallel corpora automatically annotated with several NLP tools, including lemma and part-of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference.
no code implementations • LREC 2014 • Rudolf Rosa, Jan Ma{\v{s}}ek, David Mare{\v{c}}ek, Martin Popel, Daniel Zeman, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}
We present HamleDT 2. 0 (HArmonized Multi-LanguagE Dependency Treebank).
no code implementations • LREC 2012 • Ond{\v{r}}ej Bojar, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}, Ond{\v{r}}ej Du{\v{s}}ek, Petra Galu{\v{s}}{\v{c}}{\'a}kov{\'a}, Martin Majli{\v{s}}, David Mare{\v{c}}ek, Ji{\v{r}}{\'\i} Mar{\v{s}}{\'\i}k, Michal Nov{\'a}k, Martin Popel, Ale{\v{s}} Tamchyna
CzEng 1. 0 is automatically aligned at the level of sentences as well as words.
no code implementations • LREC 2012 • Daniel Zeman, David Mare{\v{c}}ek, Martin Popel, Loganathan Ramasamy, Jan {\v{S}}t{\v{e}}p{\'a}nek, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}, Jan Haji{\v{c}}
We propose HamleDT ― HArmonized Multi-LanguagE Dependency Treebank.