no code implementations • LREC 2020 • Thierry Etchegoyhen, Borja Anza Porras, Andoni Azpeitia, Eva Mart{\'\i}nez Garcia, Jos{\'e} Luis Fonseca, Patricia Fonseca, Paulo Vale, Jane Dunne, Federico Gaspari, Teresa Lynn, Helen McHugh, Andy Way, Victoria Arranz, Khalid Choukri, Herv{\'e} Pusset, Alex Sicard, re, Rui Neto, Maite Melero, David Perez, Ant{\'o}nio Branco, Ruben Branco, Lu{\'\i}s Gomes
We describe the European Language Resource Infrastructure (ELRI), a decentralised network to help collect, prepare and share language resources.
no code implementations • LREC 2020 • Eva Mart{\'\i}nez Garcia, {\'A}lvaro Garc{\'\i}a Tejedor
Regarding domain adaptation, results show how using in-domain data helps systems to achieve a better quality translation.
no code implementations • WS 2019 • Eva Mart{\'\i}nez Garcia, Carles Creus, Cristina Espa{\~n}a-Bonet
This work presents a decoding architecture that fuses the information from a neural translation model and the context semantics enclosed in a semantic space language model based on word embeddings.
no code implementations • WS 2018 • Thierry Etchegoyhen, Eva Mart{\'\i}nez Garcia, Andoni Azpeitia
We describe Vicomtech{'}s participation in the WMT 2018 shared task on quality estimation, for which we submitted minimalist quality estimators.
no code implementations • WS 2018 • Andoni Azpeitia, Thierry Etchegoyhen, Eva Mart{\'\i}nez Garcia
To address the specifics of the corpus filtering task, which features significant volumes of noisy data, the core method was expanded with a penalty based on the amount of unknown words in sentence pairs.
no code implementations • WS 2017 • Andoni Azpeitia, Thierry Etchegoyhen, Eva Mart{\'\i}nez Garcia
This article presents the STACCw system for the BUCC 2017 shared task on parallel sentence extraction from comparable corpora.
no code implementations • LREC 2016 • I{\~n}aki San Vicente, I{\~n}aki Alegr{\'\i}a, Cristina Espa{\~n}a-Bonet, Pablo Gamallo, Hugo Gon{\c{c}}alo Oliveira, Eva Mart{\'\i}nez Garcia, Antonio Toral, Arkaitz Zubiaga, Nora Aranberri
We introduce TweetMT, a parallel corpus of tweets in four language pairs that combine five languages (Spanish from/to Basque, Catalan, Galician and Portuguese), all of which have an official status in the Iberian Peninsula.