Search Results for author: Thierry Etchegoyhen

Found 18 papers, 1 papers with code

STACC, OOV Density and N-gram Saturation: Vicomtech's Participation in the WMT 2018 Shared Task on Parallel Corpus Filtering

no code implementations WS 2018 Andoni Azpeitia, Thierry Etchegoyhen, Eva Mart{\'\i}nez Garcia

To address the specifics of the corpus filtering task, which features significant volumes of noisy data, the core method was expanded with a penalty based on the amount of unknown words in sentence pairs.

Machine Translation Outlier Detection +1

Weighted Set-Theoretic Alignment of Comparable Sentences

no code implementations WS 2017 Andoni Azpeitia, Thierry Etchegoyhen, Eva Mart{\'\i}nez Garcia

This article presents the STACCw system for the BUCC 2017 shared task on parallel sentence extraction from comparable corpora.

Machine Translation Sentence

Exploiting a Large Strongly Comparable Corpus

no code implementations LREC 2016 Thierry Etchegoyhen, Andoni Azpeitia, Naiara P{\'e}rez

The EITB corpus, a strongly comparable corpus in the news domain, is to be shared with the research community, as an aid for the development and testing of methods in comparable corpora exploitation, and as basis for the improvement of data-driven machine translation systems for this language pair.

Machine Translation Translation

To Case or not to case: Evaluating Casing Methods for Neural Machine Translation

no code implementations LREC 2020 Thierry Etchegoyhen, Harritxu Gete

We present a comparative evaluation of casing methods for Neural Machine Translation, to help establish an optimal pre- and post-processing methodology.

Machine Translation Translation

Split and Rephrase with Large Language Models

no code implementations18 Dec 2023 David Ponce, Thierry Etchegoyhen, Jesús Calleja Pérez, Harritxu Gete

Our results provide a fine-grained analysis of the potential and limitations of large language models for SPRP, with significant improvements achievable using relatively small amounts of training data and model parameters overall, and remaining limitations for all models on the task.

In-Context Learning Split and Rephrase

Promoting Target Data in Context-aware Neural Machine Translation

no code implementations9 Feb 2024 Harritxu Gete, Thierry Etchegoyhen

Standard context-aware neural machine translation (NMT) typically relies on parallel document-level data, exploiting both source and target contexts.

Machine Translation NMT +1

Cannot find the paper you are looking for? You can Submit a new open access paper.