Search Results for author: Joris Vanvinckenroye

Found 2 papers, 0 papers with code

Detecting Various Types of Noise for Neural Machine Translation

no code implementations • Findings (ACL) 2022 • Christian Herold, Jan Rosendahl, Joris Vanvinckenroye, Hermann Ney

The filtering and/or selection of training data is one of the core aspects to be considered when building a strong machine translation system. In their influential work, Khayrallah and Koehn (2018) investigated the impact of different types of noise on the performance of machine translation systems. In the same year the WMT introduced a shared task on parallel corpus filtering, which went on to be repeated in the following years, and resulted in many different filtering approaches being proposed. In this work we aim to combine the recent achievements in data filtering with the original analysis of Khayrallah and Koehn (2018) and investigate whether state-of-the-art filtering systems are capable of removing all the suggested noise types. We observe that most of these types of noise can be detected with an accuracy of over 90% by modern filtering systems when operating in a well studied high resource setting. However, we also find that when confronted with more refined noise categories or when working with a less common language pair, the performance of the filtering systems is far from optimal, showing that there is still room for improvement in this area of research.

Machine Translation Translation

Paper
Add Code

Data Filtering using Cross-Lingual Word Embeddings

no code implementations • NAACL 2021 • Christian Herold, Jan Rosendahl, Joris Vanvinckenroye, Hermann Ney

While we find that our approaches come out at the top on all three tasks, different variants perform best on different tasks.

Cross-Lingual Word Embeddings Language Identification +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.