no code implementations • EAMT 2022 • Artūrs Vasiļevskis, Jānis Ziediņš, Marko Tadić, None Željka Motika, Mark Fishel, Hrafn Loftsson, Jón Gu, Claudia Borg, Keith Cortis, Judie Attard, Donatienne Spiteri
The work in progress on the CEF Action National Language Technology Platform (NLTP) is presented.
no code implementations • EAMT 2022 • Toms Bergmanis, Marcis Pinnis, Roberts Rozis, Jānis Šlapiņš, Valters Šics, Berta Bernāne, Guntars Pužulis, Endijs Titomers, Andre Tättar, Taido Purason, Hele-Andra Kuulmets, Agnes Luhtaru, Liisa Rätsep, Maali Tars, Annika Laumets-Tättar, Mark Fishel
We present the MTee project - a research initiative funded via an Estonian public procurement to develop machine translation technology that is open-source and free of charge.
no code implementations • ACL 2022 • Matīss Rikters, Marili Tomingas, Tuuli Tuisk, Valts Ernštreits, Mark Fishel
Livonian is one of the most endangered languages in Europe with just a tiny handful of speakers and virtually no publicly available corpora.
no code implementations • TDLE (LREC) 2022 • Marko Tadić, Daša Farkaš, Matea Filko, Artūrs Vasiļevskis, Andrejs Vasiļjevs, Jānis Ziediņš, Željka Motika, Mark Fishel, Hrafn Loftsson, Jón Guðnason, Claudia Borg, Keith Cortis, Judie Attard, Donatienne Spiteri
This article presents the work in progress on the collaborative project of several European countries to develop National Language Technology Platform (NLTP).
no code implementations • WMT (EMNLP) 2021 • Lisa Yankovskaya, Mark Fishel
The paper presents our submission to the WMT2021 Shared Task on Quality Estimation (QE).
no code implementations • WMT (EMNLP) 2020 • Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Vishrav Chaudhary, Mark Fishel, Francisco Guzmán, Lucia Specia
We explore (a) a black-box approach to QE based on pre-trained representations; and (b) glass-box approaches that leverage various indicators that can be extracted from the neural MT systems.
no code implementations • 8 Mar 2024 • Agnes Luhtaru, Taido Purason, Martin Vainikko, Maksym Del, Mark Fishel
This study explores enhancing grammatical error correction (GEC) through artificial error generation (AEG) using language models (LMs).
no code implementations • 18 Feb 2024 • Agnes Luhtaru, Martin Vainikko, Krista Liin, Kais Allkivi-Metsoja, Jaagup Kippar, Pille Eslon, Mark Fishel
To mitigate this, (1) we annotated more correction data for model training and testing, (2) we tested transfer-learning, i. e. retraining machine learning models created for other tasks, so as not to depend solely on correction data, (3) we compared the developed method and model with alternatives, including large language models.
no code implementations • 20 Dec 2022 • Maksym Del, Mark Fishel
Our work introduces a challenging benchmark for future studies on reasoning in language models and contributes to a better understanding of the limits of LLMs' abilities.
1 code implementation • 4 Dec 2022 • Maksym Del, Mark Fishel
Related works used indexes like CKA and variants of CCA to measure the similarity of cross-lingual representations in multilingual language models.
1 code implementation • WMT (EMNLP) 2021 • Maksym Del, Elizaveta Korotkova, Mark Fishel
Here we analyze the sentence representations learned by NMT Transformers and show that these explicitly include the information on text domains, even after only seeing the input sentences without domains labels.
1 code implementation • 2 Sep 2021 • Maksym Del, Mark Fishel
However, we observe that Baltic languages do belong to that shared space.
3 code implementations • 21 May 2020 • Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia
Quality Estimation (QE) is an important component in making Machine Translation (MT) useful in real-world applications, as it is aimed to inform the user on the quality of the MT output at test time.
no code implementations • 25 Sep 2019 • Maksym Del, Mark Fishel
Current state-of-the-art results in multilingual natural language inference (NLI) are based on tuning XLM (a pre-trained polyglot language model) separately for each language involved, resulting in multiple models.
no code implementations • WS 2019 • Erick Fonseca, Lisa Yankovskaya, Andr{\'e} F. T. Martins, Mark Fishel, Christian Federmann
We report the results of the WMT19 shared task on Quality Estimation, i. e. the task of predicting the quality of the output of machine translation systems given just the source text and the hypothesis translations.
no code implementations • WS 2019 • Lo{\"\i}c Barrault, Ond{\v{r}}ej Bojar, Marta R. Costa-juss{\`a}, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias M{\"u}ller, Santanu Pal, Matt Post, Marcos Zampieri
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019.
no code implementations • WS 2019 • Elizaveta Yankovskaya, Andre T{\"a}ttar, Mark Fishel
We propose the use of pre-trained embeddings as features of a regression model for sentence-level quality estimation of machine translation.
no code implementations • WS 2019 • Andre T{\"a}ttar, Elizaveta Korotkova, Mark Fishel
This paper describes the University of Tartu{'}s submission to the news translation shared task of WMT19, where the core idea was to train a single multilingual system to cover several language pairs of the shared task and submit its results.
no code implementations • 27 Mar 2019 • Elizaveta Korotkova, Agnes Luhtaru, Maksym Del, Krista Liin, Daiga Deksne, Mark Fishel
Both grammatical error correction and text style transfer can be viewed as monolingual sequence-to-sequence transformation tasks, but the scarcity of directly annotated data for either task makes them unfeasible for most languages.
no code implementations • EMNLP 2018 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
no code implementations • WS 2018 • Maksym Del, Andre T{\"a}ttar, Mark Fishel
This paper describes the University of Tartu{'}s submission to the unsupervised machine translation track of WMT18 news translation shared task.
no code implementations • WS 2018 • Elizaveta Yankovskaya, Andre T{\"a}ttar, Mark Fishel
This paper describes the submissions of the team from the University of Tartu for the sentence-level Quality Estimation shared task of WMT18.
no code implementations • WS 2018 • Ond{\v{r}}ej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Philipp Koehn, Christof Monz
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2018.
no code implementations • 1 Aug 2018 • Elizaveta Korotkova, Maksym Del, Mark Fishel
We introduce the task of zero-shot style transfer between different languages.
no code implementations • 30 Jul 2018 • Hasan Sait Arslan, Mark Fishel, Gholamreza Anbarjafari
In this paper a doubly attentive transformer machine translation model (DATNMT) is presented in which a doubly-attentive transformer decoder normally joins spatial visual features obtained via pretrained convolutional neural networks, conquering any gap between image captioning and translation.
no code implementations • 6 May 2018 • Sander Tars, Mark Fishel
We present an approach to neural machine translation (NMT) that supports multiple domains in a single model and allows switching between the domains when translating.
3 code implementations • MTSummit 2017 • Matīss Rikters, Mark Fishel
Attention distributions of the generated translations are a useful bi-product of attention-based recurrent neural network translation models and can be treated as soft alignments between the input and output tokens.
Ranked #3 on Machine Translation on WMT 2017 Latvian-English
no code implementations • LREC 2014 • Thierry Etchegoyhen, Lindsay Bywood, Mark Fishel, Panayota Georgakopoulou, Jie Jiang, Gerard van Loenhout, Arantza del Pozo, Mirjam Sepesy Mau{\v{c}}ec, Anja Turner, Martin Volk
This article describes a large-scale evaluation of the use of Statistical Machine Translation for professional subtitling.
no code implementations • LREC 2012 • Mark Fishel, Ond{\v{r}}ej Bojar, Maja Popovi{\'c}
Recently the first methods of automatic diagnostics of machine translation have emerged; since this area of research is relatively young, the efforts are not coordinated.
no code implementations • LREC 2012 • Volha Petukhova, Rodrigo Agerri, Mark Fishel, Sergio Penkale, Arantza del Pozo, Mirjam Sepesy Mau{\v{c}}ec, Andy Way, Panayota Georgakopoulou, Martin Volk
Subtitling and audiovisual translation have been recognized as areas that could greatly benefit from the introduction of Statistical Machine Translation (SMT) followed by post-editing, in order to increase efficiency of subtitle production process.
no code implementations • LREC 2012 • Jan Berka, Ond{\v{r}}ej Bojar, Mark Fishel, Maja Popovi{\'c}, Daniel Zeman
We present a complex, open source tool for detailed machine translation error analysis providing the user with automatic error detection and classification, several monolingual alignment algorithms as well as with training and test corpus browsing.