Search Results for author: Tom Kocmi

Found 37 papers, 11 papers with code

UFAL Submissions to the IWSLT 2016 MT Track

no code implementations • IWSLT 2016 • Ondřej Bojar, Ondřej Cífka, Jindřich Helcl, Tom Kocmi, Roman Sudarikov

We present our submissions to the IWSLT 2016 machine translation task, as our first attempt to translate subtitles and one of our early experiments with neural machine translation (NMT).

Machine Translation NMT +1

Paper
Add Code

CUNI Basque-to-English Submission in IWSLT18

no code implementations • IWSLT (EMNLP) 2018 • Tom Kocmi, Dušan Variš, Ondřej Bojar

We present our submission to the IWSLT18 Low Resource task focused on the translation from Basque-to-English.

Transfer Learning Translation

Paper
Add Code

CUNI NMT System for WAT 2018 Translation Tasks

no code implementations • PACLIC 2018 • Tom Kocmi, Shantipriya Parida, Ond?ej Bojar

NMT Translation

Paper
Add Code

Findings of the 2021 Conference on Machine Translation (WMT21)

no code implementations • WMT (EMNLP) 2021 • Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ondřej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-Jussa, Cristina España-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher Homan, Matthias Huck, Kwabena Amponsah-Kaakyire, Jungo Kasai, Daniel Khashabi, Kevin Knight, Tom Kocmi, Philipp Koehn, Nicholas Lourie, Christof Monz, Makoto Morishita, Masaaki Nagata, Ajay Nagesh, Toshiaki Nakazawa, Matteo Negri, Santanu Pal, Allahsera Auguste Tapo, Marco Turchi, Valentin Vydrin, Marcos Zampieri

This paper presents the results of the newstranslation task, the multilingual low-resourcetranslation for Indo-European languages, thetriangular translation task, and the automaticpost-editing task organised as part of the Con-ference on Machine Translation (WMT) 2021. In the news task, participants were asked tobuild machine translation systems for any of10 language pairs, to be evaluated on test setsconsisting mainly of news stories.

Machine Translation Translation

Paper
Add Code

CUNI Submission for the Inuktitut Language in WMT News 2020

no code implementations • WMT (EMNLP) 2020 • Tom Kocmi

This paper describes CUNI submission to the WMT 2020 News Translation Shared Task for the low-resource scenario Inuktitut–English in both translation directions.

Transfer Learning Translation

Paper
Add Code

Machine Translation Meta Evaluation through Translation Accuracy Challenge Sets

1 code implementation • 29 Jan 2024 • Nikita Moghe, Arnisa Fazla, Chantal Amrhein, Tom Kocmi, Mark Steedman, Alexandra Birch, Rico Sennrich, Liane Guillou

We benchmark metric performance, assess their incremental performance over successive campaigns, and measure their sensitivity to a range of linguistic phenomena.

Benchmarking Machine Translation +3

Paper
Code

Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies

1 code implementation • 12 Jan 2024 • Tom Kocmi, Vilém Zouhar, Christian Federmann, Matt Post

Ten years ago a single metric, BLEU, governed progress in machine translation research.

Machine Translation Translation

396

Paper
Code

GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4

1 code implementation • 21 Oct 2023 • Tom Kocmi, Christian Federmann

This paper introduces GEMBA-MQM, a GPT-based evaluation metric designed to detect translation quality errors, specifically for the quality estimation setting without the need for human reference translations.

Translation

Paper
Code

SLIDE: Reference-free Evaluation for Machine Translation using a Sliding Document Window

no code implementations • 16 Sep 2023 • Vikas Raunak, Tom Kocmi, Matt Post

This suggests that source context may provide the same information as a human reference in disambiguating source ambiguities.

Machine Translation Sentence

Paper
Add Code

Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References

2 code implementations • 24 May 2023 • Tianyi Tang, Hongyuan Lu, Yuchen Eleanor Jiang, Haoyang Huang, Dongdong Zhang, Wayne Xin Zhao, Tom Kocmi, Furu Wei

Most research about natural language generation (NLG) relies on evaluation benchmarks with limited references for a sample, which may result in poor correlations with human judgements.

Machine Translation nlg evaluation +3

Paper
Code

Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models

1 code implementation • 24 Mar 2023 • Qingyu Lu, Baopu Qiu, Liang Ding, Kanjian Zhang, Tom Kocmi, DaCheng Tao

To further improve the performance of LLMs on MT quality assessment, we investigate several prompting designs, and propose a new prompting method called \textbf{\texttt{Error Analysis Prompting}} (EAPrompt) by combining Chain-of-Thoughts (Wei et al., 2022) and Error Analysis (Lu et al., 2023).

Machine Translation Natural Language Understanding +3

Paper
Code

Large Language Models Are State-of-the-Art Evaluators of Translation Quality

3 code implementations • 28 Feb 2023 • Tom Kocmi, Christian Federmann

We describe GEMBA, a GPT-based metric for assessment of translation quality, which works both with a reference translation and without.

Translation valid

Paper
Code

Poor Man's Quality Estimation: Predicting Reference-Based MT Metrics Without the Reference

1 code implementation • 21 Jan 2023 • Vilém Zouhar, Shehzaad Dhuliawala, Wangchunshu Zhou, Nico Daheim, Tom Kocmi, Yuchen Eleanor Jiang, Mrinmaya Sachan

Machine translation quality estimation (QE) predicts human judgements of a translation hypothesis without seeing the reference.

Machine Translation Sentence +1

Paper
Code

Searching for a higher power in the human evaluation of MT

no code implementations • 20 Oct 2022 • Johnny Tian-Zheng Wei, Tom Kocmi, Christian Federmann

In MT evaluation, pairwise comparisons are conducted to identify the better system.

Paper
Add Code

The Reality of Multi-Lingual Machine Translation

no code implementations • 25 Feb 2022 • Tom Kocmi, Dominik Macháček, Ondřej Bojar

Machine translation is for us a prime example of deep learning applications where human skills and learning capabilities are taken as a benchmark that many try to match and surpass.

Cross-Lingual Transfer Machine Translation +2

Paper
Add Code

To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation

2 code implementations • WMT (EMNLP) 2021 • Tom Kocmi, Christian Federmann, Roman Grundkiewicz, Marcin Junczys-Dowmunt, Hitokazu Matsushita, Arul Menezes

Automatic metrics are commonly used as the exclusive tool for declaring the superiority of one machine translation system's quality over another.

Machine Translation Sentence +1

Paper
Code

On User Interfaces for Large-Scale Document-Level Human Evaluation of Machine Translation Outputs

no code implementations • EACL (HumEval) 2021 • Roman Grundkiewicz, Marcin Junczys-Dowmunt, Christian Federmann, Tom Kocmi

Recent studies emphasize the need of document context in human evaluation of machine translations, but little research has been done on the impact of user interfaces on annotator productivity and the reliability of assessments.

Machine Translation Translation

Paper
Add Code

THEaiTRE 1.0: Interactive generation of theatre play scripts

no code implementations • 17 Feb 2021 • Rudolf Rosa, Tomáš Musil, Ondřej Dušek, Dominik Jurko, Patrícia Schmidtová, David Mareček, Ondřej Bojar, Tom Kocmi, Daniel Hrbek, David Košťák, Martina Kinská, Marie Nováková, Josef Doležal, Klára Vosecká, Tomáš Studeník, Petr Žabka

We present the first version of a system for interactive generation of theatre play scripts.

Paper
Add Code

Findings of the 2020 Conference on Machine Translation (WMT20)

no code implementations • EMNLP 2020 • Loïc Barrault, Magdalena Biesialska, Ondřej Bojar, Marta R. Costa-jussà, Christian Federmann, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Matthias Huck, Eric Joanis, Tom Kocmi, Philipp Koehn, Chi-kiu Lo, Nikola Ljubešić, Christof Monz, Makoto Morishita, Masaaki Nagata, Toshiaki Nakazawa, Santanu Pal, Matt Post, Marcos Zampieri

In the news task, participants were asked to build machine translation systems for any of 11 language pairs, to be evaluated on test sets consisting mainly of news stories.

Machine Translation Translation

Paper
Add Code

CUNI Systems for the Unsupervised and Very Low Resource Translation Task in WMT20

no code implementations • WMT (EMNLP) 2020 • Ivana Kvapilíková, Tom Kocmi, Ondřej Bojar

This paper presents a description of CUNI systems submitted to the WMT20 task on unsupervised and very low-resource supervised machine translation between German and Upper Sorbian.

Machine Translation Transfer Learning +1

Paper
Add Code

Gender Coreference and Bias Evaluation at WMT 2020

1 code implementation • WMT (EMNLP) 2020 • Tom Kocmi, Tomasz Limisiewicz, Gabriel Stanovsky

Our work presents the largest evidence for the phenomenon in more than 19 systems submitted to the WMT over four diverse target languages: Czech, German, Polish, and Russian.

Machine Translation Translation

Paper
Code

Announcing CzEng 2.0 Parallel Corpus with over 2 Gigawords

no code implementations • 6 Jul 2020 • Tom Kocmi, Martin Popel, Ondrej Bojar

We present a new release of the Czech-English parallel corpus CzEng 2. 0 consisting of over 2 billion words (2 "gigawords") in each language.

Paper
Add Code

THEaiTRE: Artificial Intelligence to Write a Theatre Play

no code implementations • 25 Jun 2020 • Rudolf Rosa, Ondřej Dušek, Tom Kocmi, David Mareček, Tomáš Musil, Patrícia Schmidtová, Dominik Jurko, Ondřej Bojar, Daniel Hrbek, David Košťák, Martina Kinská, Josef Doležal, Klára Vosecká

We present THEaiTRE, a starting project aimed at automatic generation of theatre play scripts.

Machine Translation Translation

Paper
Add Code

Exploring Benefits of Transfer Learning in Neural Machine Translation

no code implementations • 6 Jan 2020 • Tom Kocmi

For the former scenario, we present a proof-of-concept method by reusing a model trained by other researchers.

Cross-Lingual Transfer Machine Translation +2

Paper
Add Code

Efficiently Reusing Old Models Across Languages via Transfer Learning

no code implementations • EAMT 2020 • Tom Kocmi, Ondřej Bojar

To show the applicability of our method, we recycle a Transformer model trained by different researchers and use it to seed models for different language pairs.

Machine Translation NMT +2

Paper
Add Code

CUNI Submission for Low-Resource Languages in WMT News 2019

no code implementations • WS 2019 • Tom Kocmi, Ond{\v{r}}ej Bojar

This paper describes the CUNI submission to the WMT 2019 News Translation Shared Task for the low-resource languages: Gujarati-English and Kazakh-English.

Transfer Learning Translation

Paper
Add Code

CUNI Submissions in WMT18

no code implementations • WS 2018 • Tom Kocmi, Roman Sudarikov, Ond{\v{r}}ej Bojar

Our main focus was the low-resource language pair of Estonian and English for which we utilized Finnish parallel data in a simple method.

Machine Translation Translation

Paper
Add Code

Trivial Transfer Learning for Low-Resource Neural Machine Translation

no code implementations • WS 2018 • Tom Kocmi, Ondřej Bojar

We present a simple transfer learning method, where we first train a "parent" model for a high-resource language pair and then continue the training on a lowresource pair only by replacing the training corpus.

Low-Resource Neural Machine Translation Transfer Learning +1

Paper
Add Code

SubGram: Extending Skip-gram Word Representation with Substrings

1 code implementation • 18 Jun 2018 • Tom Kocmi, Ondřej Bojar

Skip-gram (word2vec) is a recent method for creating vector representations of words ("distributed word representations") using a neural network.

Paper
Code

SumeCzech: Large Czech News-Based Summarization Dataset

no code implementations • LREC 2018 • Milan Straka, Nikita Mediankin, Tom Kocmi, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}, Vojt{\v{e}}ch Hude{\v{c}}ek, Jan Haji{\v{c}}

Document Summarization Machine Translation +1

Paper
Add Code

Neural Monkey: The Current State and Beyond

no code implementations • WS 2018 • Jind{\v{r}}ich Helcl, Jind{\v{r}}ich Libovick{\'y}, Tom Kocmi, Tom{\'a}{\v{s}} Musil, Ond{\v{r}}ej C{\'\i}fka, Du{\v{s}}an Vari{\v{s}}, Ond{\v{r}}ej Bojar

Image Captioning Machine Translation +3

Paper
Add Code

An Exploration of Word Embedding Initialization in Deep-Learning Tasks

no code implementations • WS 2017 • Tom Kocmi, Ondřej Bojar

We support this hypothesis by observing the performance in learning lexical relations and by the fact that the network can learn to perform reasonably in its task even with fixed random embeddings.

Word Embeddings

Paper
Add Code

CUNI NMT System for WAT 2017 Translation Tasks

no code implementations • WS 2017 • Tom Kocmi, Du{\v{s}}an Vari{\v{s}}, Ond{\v{r}}ej Bojar

The paper presents this year{'}s CUNI submissions to the WAT 2017 Translation Task focusing on the Japanese-English translation, namely Scientific papers subtask, Patents subtask and Newswire subtask.

Machine Translation NMT +2