Search Results for author: Tom Kocmi

Found 34 papers, 8 papers with code

UFAL Submissions to the IWSLT 2016 MT Track

no code implementations IWSLT 2016 Ondřej Bojar, Ondřej Cífka, Jindřich Helcl, Tom Kocmi, Roman Sudarikov

We present our submissions to the IWSLT 2016 machine translation task, as our first attempt to translate subtitles and one of our early experiments with neural machine translation (NMT).

Machine Translation NMT +1

CUNI Submission for the Inuktitut Language in WMT News 2020

no code implementations WMT (EMNLP) 2020 Tom Kocmi

This paper describes CUNI submission to the WMT 2020 News Translation Shared Task for the low-resource scenario Inuktitut–English in both translation directions.

Transfer Learning Translation

CUNI Basque-to-English Submission in IWSLT18

no code implementations IWSLT (EMNLP) 2018 Tom Kocmi, Dušan Variš, Ondřej Bojar

We present our submission to the IWSLT18 Low Resource task focused on the translation from Basque-to-English.

Transfer Learning Translation

Findings of the 2021 Conference on Machine Translation (WMT21)

no code implementations WMT (EMNLP) 2021 Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ondřej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-Jussa, Cristina España-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher Homan, Matthias Huck, Kwabena Amponsah-Kaakyire, Jungo Kasai, Daniel Khashabi, Kevin Knight, Tom Kocmi, Philipp Koehn, Nicholas Lourie, Christof Monz, Makoto Morishita, Masaaki Nagata, Ajay Nagesh, Toshiaki Nakazawa, Matteo Negri, Santanu Pal, Allahsera Auguste Tapo, Marco Turchi, Valentin Vydrin, Marcos Zampieri

This paper presents the results of the newstranslation task, the multilingual low-resourcetranslation for Indo-European languages, thetriangular translation task, and the automaticpost-editing task organised as part of the Con-ference on Machine Translation (WMT) 2021. In the news task, participants were asked tobuild machine translation systems for any of10 language pairs, to be evaluated on test setsconsisting mainly of news stories.

Machine Translation Test +1

GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4

1 code implementation21 Oct 2023 Tom Kocmi, Christian Federmann

This paper introduces GEMBA-MQM, a GPT-based evaluation metric designed to detect translation quality errors, specifically for the quality estimation setting without the need for human reference translations.


Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models: A Case Study on ChatGPT

1 code implementation24 Mar 2023 Qingyu Lu, Baopu Qiu, Liang Ding, Kanjian Zhang, Tom Kocmi, DaCheng Tao

Generative large language models (LLMs), e. g., ChatGPT, have demonstrated remarkable proficiency across several NLP tasks, such as machine translation, text summarization.

Machine Translation Natural Language Understanding +3

Large Language Models Are State-of-the-Art Evaluators of Translation Quality

3 code implementations28 Feb 2023 Tom Kocmi, Christian Federmann

We describe GEMBA, a GPT-based metric for assessment of translation quality, which works both with a reference translation and without.

Translation valid

Searching for a higher power in the human evaluation of MT

no code implementations20 Oct 2022 Johnny Tian-Zheng Wei, Tom Kocmi, Christian Federmann

In MT evaluation, pairwise comparisons are conducted to identify the better system.

The Reality of Multi-Lingual Machine Translation

no code implementations25 Feb 2022 Tom Kocmi, Dominik Macháček, Ondřej Bojar

Machine translation is for us a prime example of deep learning applications where human skills and learning capabilities are taken as a benchmark that many try to match and surpass.

Cross-Lingual Transfer Machine Translation +2

On User Interfaces for Large-Scale Document-Level Human Evaluation of Machine Translation Outputs

no code implementations EACL (HumEval) 2021 Roman Grundkiewicz, Marcin Junczys-Dowmunt, Christian Federmann, Tom Kocmi

Recent studies emphasize the need of document context in human evaluation of machine translations, but little research has been done on the impact of user interfaces on annotator productivity and the reliability of assessments.

Machine Translation Translation

CUNI Systems for the Unsupervised and Very Low Resource Translation Task in WMT20

no code implementations WMT (EMNLP) 2020 Ivana Kvapilíková, Tom Kocmi, Ondřej Bojar

This paper presents a description of CUNI systems submitted to the WMT20 task on unsupervised and very low-resource supervised machine translation between German and Upper Sorbian.

Machine Translation Transfer Learning +1

Gender Coreference and Bias Evaluation at WMT 2020

1 code implementation WMT (EMNLP) 2020 Tom Kocmi, Tomasz Limisiewicz, Gabriel Stanovsky

Our work presents the largest evidence for the phenomenon in more than 19 systems submitted to the WMT over four diverse target languages: Czech, German, Polish, and Russian.

Machine Translation Translation

Announcing CzEng 2.0 Parallel Corpus with over 2 Gigawords

no code implementations6 Jul 2020 Tom Kocmi, Martin Popel, Ondrej Bojar

We present a new release of the Czech-English parallel corpus CzEng 2. 0 consisting of over 2 billion words (2 "gigawords") in each language.

Exploring Benefits of Transfer Learning in Neural Machine Translation

no code implementations6 Jan 2020 Tom Kocmi

For the former scenario, we present a proof-of-concept method by reusing a model trained by other researchers.

Cross-Lingual Transfer Machine Translation +2

Efficiently Reusing Old Models Across Languages via Transfer Learning

no code implementations EAMT 2020 Tom Kocmi, Ondřej Bojar

To show the applicability of our method, we recycle a Transformer model trained by different researchers and use it to seed models for different language pairs.

Machine Translation NMT +2

CUNI Submission for Low-Resource Languages in WMT News 2019

no code implementations WS 2019 Tom Kocmi, Ond{\v{r}}ej Bojar

This paper describes the CUNI submission to the WMT 2019 News Translation Shared Task for the low-resource languages: Gujarati-English and Kazakh-English.

Transfer Learning Translation

CUNI Submissions in WMT18

no code implementations WS 2018 Tom Kocmi, Roman Sudarikov, Ond{\v{r}}ej Bojar

Our main focus was the low-resource language pair of Estonian and English for which we utilized Finnish parallel data in a simple method.

Machine Translation Translation

Trivial Transfer Learning for Low-Resource Neural Machine Translation

no code implementations WS 2018 Tom Kocmi, Ondřej Bojar

We present a simple transfer learning method, where we first train a "parent" model for a high-resource language pair and then continue the training on a lowresource pair only by replacing the training corpus.

Low-Resource Neural Machine Translation Transfer Learning +1

SubGram: Extending Skip-gram Word Representation with Substrings

1 code implementation18 Jun 2018 Tom Kocmi, Ondřej Bojar

Skip-gram (word2vec) is a recent method for creating vector representations of words ("distributed word representations") using a neural network.


An Exploration of Word Embedding Initialization in Deep-Learning Tasks

no code implementations WS 2017 Tom Kocmi, Ondřej Bojar

We support this hypothesis by observing the performance in learning lexical relations and by the fact that the network can learn to perform reasonably in its task even with fixed random embeddings.

Word Embeddings

CUNI NMT System for WAT 2017 Translation Tasks

no code implementations WS 2017 Tom Kocmi, Du{\v{s}}an Vari{\v{s}}, Ond{\v{r}}ej Bojar

The paper presents this year{'}s CUNI submissions to the WAT 2017 Translation Task focusing on the Japanese-English translation, namely Scientific papers subtask, Patents subtask and Newswire subtask.

Machine Translation NMT +1

Curriculum Learning and Minibatch Bucketing in Neural Machine Translation

no code implementations RANLP 2017 Tom Kocmi, Ondrej Bojar

We examine the effects of particular orderings of sentence pairs on the on-line training of neural machine translation (NMT).

Machine Translation NMT +1

LanideNN: Multilingual Language Identification on Character Window

1 code implementation EACL 2017 Tom Kocmi, Ondřej Bojar

In language identification, a common first step in natural language processing, we want to automatically determine the language of some input text.

Language Identification

Cannot find the paper you are looking for? You can Submit a new open access paper.