1 code implementation • 4 Sep 2019 • Tereza Vojtěchová, Michal Novák, Miloš Klouček, Ondřej Bojar
This paper describes a machine translation test set of documents from the auditing domain and its use as one of the "test suites" in the WMT19 News Translation Task for translation directions involving Czech, English and German.
1 code implementation • NAACL 2021 • Vilém Zouhar, Michal Novák, Matúš Žilinec, Ondřej Bojar, Mateo Obregón, Robin L. Hill, Frédéric Blain, Marina Fomicheva, Lucia Specia, Lisa Yankovskaya
Translating text into a language unknown to the text's author, dubbed outbound translation, is a modern need for which the user experience has significant room for improvement, beyond the basic machine translation facility.
no code implementations • WMT (EMNLP) 2021 • Josef Jon, Michal Novák, João Paulo Aires, Dušan Variš, Ondřej Bojar
This paper describes Charles University submission for Multilingual Low-Resource Translation for Indo-European Languages shared task at WMT21.
no code implementations • WMT (EMNLP) 2021 • Josef Jon, Michal Novák, João Paulo Aires, Dušan Variš, Ondřej Bojar
Our approach is based on providing the desired translations alongside the input sentence and training the model to use these provided terms.
1 code implementation • CRAC (ACL) 2022 • Zdeněk Žabokrtský, Miloslav Konopík, Anna Nedoluzhko, Michal Novák, Maciej Ogrodniczuk, Martin Popel, Ondřej Pražák, Jakub Sido, Daniel Zeman, YIlun Zhu
The public edition of CorefUD 1. 0, which contains 13 datasets for 10 languages, was used as the source of training and evaluation data.
no code implementations • 7 Aug 2023 • Josef Jon, Dušan Variš, Michal Novák, João Paulo Aires, Ondřej Bojar
This paper explores negative lexical constraining in English to Czech neural machine translation.
no code implementations • 10 Apr 2024 • Martin Popel, Lucie Poláková, Michal Novák, Jindřich Helcl, Jindřich Libovický, Pavel Straňák, Tomáš Krabač, Jaroslava Hlaváčová, Mariia Anisimova, Tereza Chlaňová
We present Charles Translator, a machine translation system between Ukrainian and Czech, developed as part of a society-wide effort to mitigate the impact of the Russian-Ukrainian war on individuals and society.
no code implementations • Findings (EMNLP) 2021 • Martin Popel, Zdeněk Žabokrtský, Anna Nedoluzhko, Michal Novák, Daniel Zeman
One can find dozens of data resources for various languages in which coreference - a relation between two or more expressions that refer to the same real-world entity - is manually annotated.
no code implementations • LREC 2022 • Anna Nedoluzhko, Michal Novák, Martin Popel, Zdeněk Žabokrtský, Amir Zeldes, Daniel Zeman
Recent advances in standardization for annotated language resources have led to successful large scale efforts, such as the Universal Dependencies (UD) project for multilingual syntactically annotated data.