Search Results for author: Cristina España-Bonet

Found 21 papers, 8 papers with code

Findings of the 2021 Conference on Machine Translation (WMT21)

no code implementations WMT (EMNLP) 2021 Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ondřej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-Jussa, Cristina España-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher Homan, Matthias Huck, Kwabena Amponsah-Kaakyire, Jungo Kasai, Daniel Khashabi, Kevin Knight, Tom Kocmi, Philipp Koehn, Nicholas Lourie, Christof Monz, Makoto Morishita, Masaaki Nagata, Ajay Nagesh, Toshiaki Nakazawa, Matteo Negri, Santanu Pal, Allahsera Auguste Tapo, Marco Turchi, Valentin Vydrin, Marcos Zampieri

This paper presents the results of the newstranslation task, the multilingual low-resourcetranslation for Indo-European languages, thetriangular translation task, and the automaticpost-editing task organised as part of the Con-ference on Machine Translation (WMT) 2021. In the news task, participants were asked tobuild machine translation systems for any of10 language pairs, to be evaluated on test setsconsisting mainly of news stories.

Machine Translation Translation

Tracing Source Language Interference in Translation with Graph-Isomorphism Measures

no code implementations RANLP 2021 Koel Dutta Chowdhury, Cristina España-Bonet, Josef van Genabith

Previous research has used linguistic features to show that translations exhibit traces of source language interference and that phylogenetic trees between languages can be reconstructed from the results of translations into the same language.

Open-Ended Question Answering Translation

A Simple Method for Unsupervised Bilingual Lexicon Induction for Data-Imbalanced, Closely Related Language Pairs

1 code implementation23 May 2023 Niyati Bafna, Cristina España-Bonet, Josef van Genabith, Benoît Sagot, Rachel Bawden

Existing approaches for unsupervised bilingual lexicon induction (BLI) often depend on good quality static or contextual embeddings trained on large monolingual corpora for both languages.

Bilingual Lexicon Induction

Explaining Translationese: why are Neural Classifiers Better and what do they Learn?

no code implementations24 Oct 2022 Kwabena Amponsah-Kaakyire, Daria Pylypenko, Josef van Genabith, Cristina España-Bonet

Previous research did not show $(i)$ whether the difference is because of the features, the classifiers or both, and $(ii)$ what the neural classifiers actually learn.

Feature Engineering Representation Learning

Exploiting Social Media Content for Self-Supervised Style Transfer

1 code implementation NAACL (SocialNLP) 2022 Dana Ruiter, Thomas Kleinbauer, Cristina España-Bonet, Josef van Genabith, Dietrich Klakow

Recent research on style transfer takes inspiration from unsupervised neural machine translation (UNMT), learning from large amounts of non-parallel data by exploiting cycle consistency loss, back-translation, and denoising autoencoders.

Denoising Machine Translation +3

Towards Debiasing Translation Artifacts

1 code implementation NAACL 2022 Koel Dutta Chowdhury, Rricha Jalota, Cristina España-Bonet, Josef van Genabith

Cross-lingual natural language processing relies on translation, either by humans or machines, at different levels, from translating training data to translating test sets.

Natural Language Inference Translation

Tailoring and Evaluating the Wikipedia for in-Domain Comparable Corpora Extraction

1 code implementation3 May 2020 Cristina España-Bonet, Alberto Barrón-Cedeño, Lluís Màrquez

Our best metric for domainness shows a strong correlation with the human-judged precision, representing a reasonable automatic alternative to assess the quality of domain-specific corpora.


Self-Induced Curriculum Learning in Self-Supervised Neural Machine Translation

no code implementations EMNLP 2020 Dana Ruiter, Josef van Genabith, Cristina España-Bonet

Self-supervised neural machine translation (SSNMT) jointly learns to identify and select suitable training data from comparable (rather than parallel) corpora and to translate, in a way that the two tasks support each other in a virtuous circle.

Denoising Machine Translation +1

GeBioToolkit: Automatic Extraction of Gender-Balanced Multilingual Corpus of Wikipedia Biographies

1 code implementation LREC 2020 Marta R. Costa-jussà, Pau Li Lin, Cristina España-Bonet

We introduce GeBioToolkit, a tool for extracting multilingual parallel corpora at sentence level, with document and gender information from Wikipedia biographies.

Massive vs. Curated Word Embeddings for Low-Resourced Languages. The Case of Yorùbá and Twi

1 code implementation5 Dec 2019 Jesujoba O. Alabi, Kwabena Amponsah-Kaakyire, David I. Adelani, Cristina España-Bonet

In this paper we focus on two African languages, Yor\`ub\'a and Twi, and compare the word embeddings obtained in this way, with word embeddings obtained from curated corpora and a language-dependent processing.

Word Embeddings

Analysing Coreference in Transformer Outputs

no code implementations WS 2019 Ekaterina Lapshinova-Koltunski, Cristina España-Bonet, Josef van Genabith

We analyse coreference phenomena in three neural machine translation systems trained with different data settings with or without access to explicit intra- and cross-sentential anaphoric information.

Machine Translation Translation

Self-Induced Curriculum Learning in Neural Machine Translation

no code implementations25 Sep 2019 Dana Ruiter, Cristina España-Bonet, Josef van Genabith

Self-supervised neural machine translation (SS-NMT) learns how to extract/select suitable training data from comparable (rather than parallel) corpora and how to translate, in a way that the two tasks support each other in a virtuous circle.

Denoising Machine Translation +2

An Empirical Analysis of NMT-Derived Interlingual Embeddings and their Use in Parallel Sentence Identification

no code implementations18 Apr 2017 Cristina España-Bonet, Ádám Csaba Varga, Alberto Barrón-Cedeño, Josef van Genabith

First, we systematically study the NMT context vectors, i. e. output of the encoder, and their power as an interlingua representation of a sentence.

Machine Translation NMT +2

Resolving Out-of-Vocabulary Words with Bilingual Embeddings in Machine Translation

no code implementations5 Aug 2016 Pranava Swaroop Madhyastha, Cristina España-Bonet

Out-of-vocabulary words account for a large proportion of errors in machine translation systems, especially when the system is used on a different domain than the one where it was trained.

Machine Translation Translation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.