no code implementations • RANLP 2021 • Thierry Etchegoyhen, David Ponce, Harritxu Gete, Victor Ruiz
Adaptive Machine Translation purports to dynamically include user feedback to improve translation quality.
1 code implementation • LREC 2022 • Harritxu Gete, Thierry Etchegoyhen, David Ponce, Gorka Labaka, Nora Aranberri, Ander Corral, Xabier Saralegi, Igor Ellakuria, Maite Martin
Document-level Neural Machine Translation aims to increase the quality of neural translation models by taking into account contextual information.
no code implementations • EAMT 2020 • Ēriks Ajausks, Victoria Arranz, Laurent Bié, Aleix Cerdà-i-Cucó, Khalid Choukri, Montse Cuadros, Hans Degroote, Amando Estela, Thierry Etchegoyhen, Mercedes García-Martínez, Aitor García-Pablos, Manuel Herranz, Alejandro Kohan, Maite Melero, Mike Rosner, Roberts Rozis, Patrick Paroubek, Artūrs Vasiļevskis, Pierre Zweigenbaum
We describe the MAPA project, funded under the Connecting Europe Facility programme, whose goal is the development of an open-source de-identification toolkit for all official European Union languages.
no code implementations • 9 Feb 2024 • Harritxu Gete, Thierry Etchegoyhen
Standard context-aware neural machine translation (NMT) typically relies on parallel document-level data, exploiting both source and target contexts.
no code implementations • 18 Dec 2023 • David Ponce, Thierry Etchegoyhen, Jesús Calleja Pérez, Harritxu Gete
Our results provide a fine-grained analysis of the potential and limitations of large language models for SPRP, with significant improvements achievable using relatively small amounts of training data and model parameters overall, and remaining limitations for all models on the task.
no code implementations • LREC 2020 • Thierry Etchegoyhen, Borja Anza Porras, Andoni Azpeitia, Eva Mart{\'\i}nez Garcia, Jos{\'e} Luis Fonseca, Patricia Fonseca, Paulo Vale, Jane Dunne, Federico Gaspari, Teresa Lynn, Helen McHugh, Andy Way, Victoria Arranz, Khalid Choukri, Herv{\'e} Pusset, Alex Sicard, re, Rui Neto, Maite Melero, David Perez, Ant{\'o}nio Branco, Ruben Branco, Lu{\'\i}s Gomes
We describe the European Language Resource Infrastructure (ELRI), a decentralised network to help collect, prepare and share language resources.
no code implementations • LREC 2020 • Thierry Etchegoyhen, Harritxu Gete
We present the results of a case study in the exploitation of comparable corpora for Neural Machine Translation.
no code implementations • LREC 2020 • Thierry Etchegoyhen, Harritxu Gete
We present a comparative evaluation of casing methods for Neural Machine Translation, to help establish an optimal pre- and post-processing methodology.
no code implementations • WS 2018 • Thierry Etchegoyhen, Eva Mart{\'\i}nez Garcia, Andoni Azpeitia
We describe Vicomtech{'}s participation in the WMT 2018 shared task on quality estimation, for which we submitted minimalist quality estimators.
no code implementations • WS 2018 • Andoni Azpeitia, Thierry Etchegoyhen, Eva Mart{\'\i}nez Garcia
To address the specifics of the corpus filtering task, which features significant volumes of noisy data, the core method was expanded with a penalty based on the amount of unknown words in sentence pairs.
no code implementations • WS 2017 • Andoni Azpeitia, Thierry Etchegoyhen, Eva Mart{\'\i}nez Garcia
This article presents the STACCw system for the BUCC 2017 shared task on parallel sentence extraction from comparable corpora.
no code implementations • LREC 2016 • Thierry Etchegoyhen, Andoni Azpeitia, Naiara P{\'e}rez
The EITB corpus, a strongly comparable corpus in the news domain, is to be shared with the research community, as an aid for the development and testing of methods in comparable corpora exploitation, and as basis for the improvement of data-driven machine translation systems for this language pair.
no code implementations • LREC 2014 • Thierry Etchegoyhen, Lindsay Bywood, Mark Fishel, Panayota Georgakopoulou, Jie Jiang, Gerard van Loenhout, Arantza del Pozo, Mirjam Sepesy Mau{\v{c}}ec, Anja Turner, Martin Volk
This article describes a large-scale evaluation of the use of Statistical Machine Translation for professional subtitling.