Search Results for author: Thierry Etchegoyhen

Found 18 papers, 1 papers with code

TANDO: A Corpus for Document-level Machine Translation

1 code implementation • LREC 2022 • Harritxu Gete, Thierry Etchegoyhen, David Ponce, Gorka Labaka, Nora Aranberri, Ander Corral, Xabier Saralegi, Igor Ellakuria, Maite Martin

Document-level Neural Machine Translation aims to increase the quality of neural translation models by taking into account contextual information.

Document Level Machine Translation Machine Translation +2

Paper
Code

Set-Theoretic Alignment for Comparable Corpora

no code implementations • ACL 2016 • Thierry Etchegoyhen, Andoni Azpeitia

Machine Translation

Paper
Add Code

Supervised and Unsupervised Minimalist Quality Estimators: Vicomtech's Participation in the WMT 2018 Quality Estimation Task

no code implementations • WS 2018 • Thierry Etchegoyhen, Eva Mart{\'\i}nez Garcia, Andoni Azpeitia

We describe Vicomtech{'}s participation in the WMT 2018 shared task on quality estimation, for which we submitted minimalist quality estimators.

Language Modelling Machine Translation +2

Paper
Add Code

STACC, OOV Density and N-gram Saturation: Vicomtech's Participation in the WMT 2018 Shared Task on Parallel Corpus Filtering

no code implementations • WS 2018 • Andoni Azpeitia, Thierry Etchegoyhen, Eva Mart{\'\i}nez Garcia

To address the specifics of the corpus filtering task, which features significant volumes of noisy data, the core method was expanded with a penalty based on the amount of unknown words in sentence pairs.

Machine Translation Outlier Detection +1

Paper
Add Code

Weighted Set-Theoretic Alignment of Comparable Sentences

no code implementations • WS 2017 • Andoni Azpeitia, Thierry Etchegoyhen, Eva Mart{\'\i}nez Garcia

This article presents the STACCw system for the BUCC 2017 shared task on parallel sentence extraction from comparable corpora.

Machine Translation Sentence

Paper
Add Code

DOCAL - Vicomtech's Participation in the WMT16 Shared Task on Bilingual Document Alignment

no code implementations • WS 2016 • Andoni Azpeitia, Thierry Etchegoyhen

Machine Translation Semantic Textual Similarity

Paper
Add Code

A Portable Method for Parallel and Comparable Document Alignment

no code implementations • WS 2016 • Thierry Etchegoyhen, Andoni Azpeitia

Machine Translation

Paper
Add Code

Evaluating Domain Adaptation for Machine Translation Across Scenarios

no code implementations • LREC 2018 • Thierry Etchegoyhen, Anna Fern{\'a}ndez Torn{\'e}, Andoni Azpeitia, Eva Mart{\'\i}nez Garcia, Anna Matamala

Domain Adaptation Machine Translation +1

Paper
Add Code

Machine Translation for Subtitling: A Large-Scale Evaluation

no code implementations • LREC 2014 • Thierry Etchegoyhen, Lindsay Bywood, Mark Fishel, Panayota Georgakopoulou, Jie Jiang, Gerard van Loenhout, Arantza del Pozo, Mirjam Sepesy Mau{\v{c}}ec, Anja Turner, Martin Volk

This article describes a large-scale evaluation of the use of Statistical Machine Translation for professional subtitling.

Language Modelling Machine Translation +1

Paper
Add Code

Exploiting a Large Strongly Comparable Corpus

no code implementations • LREC 2016 • Thierry Etchegoyhen, Andoni Azpeitia, Naiara P{\'e}rez

The EITB corpus, a strongly comparable corpus in the news domain, is to be shared with the research community, as an aid for the development and testing of methods in comparable corpora exploitation, and as basis for the improvement of data-driven machine translation systems for this language pair.

Machine Translation Translation

Paper
Add Code

ELRI: A Decentralised Network of National Relay Stations to Collect, Prepare and Share Language Resources

no code implementations • LREC 2020 • Thierry Etchegoyhen, Borja Anza Porras, Andoni Azpeitia, Eva Mart{\'\i}nez Garcia, Jos{\'e} Luis Fonseca, Patricia Fonseca, Paulo Vale, Jane Dunne, Federico Gaspari, Teresa Lynn, Helen McHugh, Andy Way, Victoria Arranz, Khalid Choukri, Herv{\'e} Pusset, Alex Sicard, re, Rui Neto, Maite Melero, David Perez, Ant{\'o}nio Branco, Ruben Branco, Lu{\'\i}s Gomes

We describe the European Language Resource Infrastructure (ELRI), a decentralised network to help collect, prepare and share language resources.

Translation

Paper
Add Code

To Case or not to case: Evaluating Casing Methods for Neural Machine Translation

no code implementations • LREC 2020 • Thierry Etchegoyhen, Harritxu Gete

We present a comparative evaluation of casing methods for Neural Machine Translation, to help establish an optimal pre- and post-processing methodology.

Machine Translation Translation

Paper
Add Code

Handle with Care: A Case Study in Comparable Corpora Exploitation for Neural Machine Translation

no code implementations • LREC 2020 • Thierry Etchegoyhen, Harritxu Gete

We present the results of a case study in the exploitation of comparable corpora for Neural Machine Translation.

Machine Translation Translation

Paper
Add Code

The Multilingual Anonymisation Toolkit for Public Administrations (MAPA) Project

no code implementations • EAMT 2020 • Ēriks Ajausks, Victoria Arranz, Laurent Bié, Aleix Cerdà-i-Cucó, Khalid Choukri, Montse Cuadros, Hans Degroote, Amando Estela, Thierry Etchegoyhen, Mercedes García-Martínez, Aitor García-Pablos, Manuel Herranz, Alejandro Kohan, Maite Melero, Mike Rosner, Roberts Rozis, Patrick Paroubek, Artūrs Vasiļevskis, Pierre Zweigenbaum

We describe the MAPA project, funded under the Connecting Europe Facility programme, whose goal is the development of an open-source de-identification toolkit for all official European Union languages.

De-identification

Paper
Add Code

Online Learning over Time in Adaptive Neural Machine Translation

no code implementations • RANLP 2021 • Thierry Etchegoyhen, David Ponce, Harritxu Gete, Victor Ruiz

Adaptive Machine Translation purports to dynamically include user feedback to improve translation quality.

Machine Translation Translation

Paper
Add Code

Exploiting Relative Frequencies for Data Selection

no code implementations • MTSummit 2017 • Thierry Etchegoyhen, Andoni Azpeitia, Eva Martinez García

Paper
Add Code

Split and Rephrase with Large Language Models

no code implementations • 18 Dec 2023 • David Ponce, Thierry Etchegoyhen, Jesús Calleja Pérez, Harritxu Gete

Our results provide a fine-grained analysis of the potential and limitations of large language models for SPRP, with significant improvements achievable using relatively small amounts of training data and model parameters overall, and remaining limitations for all models on the task.

In-Context Learning Split and Rephrase

Paper
Add Code

Promoting Target Data in Context-aware Neural Machine Translation

no code implementations • 9 Feb 2024 • Harritxu Gete, Thierry Etchegoyhen

Standard context-aware neural machine translation (NMT) typically relies on parallel document-level data, exploiting both source and target contexts.

Machine Translation NMT +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.