no code implementations • WMT (EMNLP) 2021 • Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ondřej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-Jussa, Cristina España-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher Homan, Matthias Huck, Kwabena Amponsah-Kaakyire, Jungo Kasai, Daniel Khashabi, Kevin Knight, Tom Kocmi, Philipp Koehn, Nicholas Lourie, Christof Monz, Makoto Morishita, Masaaki Nagata, Ajay Nagesh, Toshiaki Nakazawa, Matteo Negri, Santanu Pal, Allahsera Auguste Tapo, Marco Turchi, Valentin Vydrin, Marcos Zampieri
This paper presents the results of the newstranslation task, the multilingual low-resourcetranslation for Indo-European languages, thetriangular translation task, and the automaticpost-editing task organised as part of the Con-ference on Machine Translation (WMT) 2021. In the news task, participants were asked tobuild machine translation systems for any of10 language pairs, to be evaluated on test setsconsisting mainly of news stories.
no code implementations • IWSLT 2017 • Cristina España-Bonet, Josef van Genabith
This paper describes the UdS-DFKI participation to the multilingual task of the IWSLT Evaluation 2017.
no code implementations • MTSummit 2021 • David Adelani, Dana Ruiter, Jesujoba Alabi, Damilola Adebonojo, Adesina Ayeni, Mofe Adeyemi, Ayodele Esther Awokoya, Cristina España-Bonet
Massively multilingual machine translation (MT) has shown impressive capabilities and including zero and few-shot translation between low-resource language pairs.
no code implementations • RANLP 2021 • Koel Dutta Chowdhury, Cristina España-Bonet, Josef van Genabith
Previous research has used linguistic features to show that translations exhibit traces of source language interference and that phylogenetic trees between languages can be reconstructed from the results of translations into the same language.
no code implementations • MTSummit 2021 • Fabrizio Nunnari, Judith Bauerdiek, Lucas Bernhard, Cristina España-Bonet, Corinna Jäger, Amelie Unger, Kristoffer Waldow, Sonja Wecker, Elisabeth André, Stephan Busemann, Christian Dold, Arnulph Fuhrmann, Patrick Gebhard, Yasser Hamidullah, Marcel Hauck, Yvonne Kossel, Martin Misiak, Dieter Wallach, Alexander Stricker
This paper presents an overview of AVASAG; an ongoing applied-research project developing a text-to-sign-language translation system for public services.
no code implementations • 10 Jan 2025 • Jesujoba O. Alabi, Israel Abebe Azime, Miaoran Zhang, Cristina España-Bonet, Rachel Bawden, Dawei Zhu, David Ifeoluwa Adelani, Clement Oyeleke Odoje, Idris Akinade, Iffat Maab, Davis David, Shamsuddeen Hassan Muhammad, Neo Putini, David O. Ademuyiwa, Andrew Caines, Dietrich Klakow
This paper introduces AFRIDOC-MT, a document-level multi-parallel translation dataset covering English and five African languages: Amharic, Hausa, Swahili, Yor\`ub\'a, and Zulu.
no code implementations • 28 Oct 2023 • Rricha Jalota, Koel Dutta Chowdhury, Cristina España-Bonet, Josef van Genabith
We show how we can eliminate the need for parallel validation data by combining the self-supervised loss with an unsupervised loss.
no code implementations • 25 Oct 2023 • Cristina España-Bonet
Traditional media typically adopt an editorial line that can be used by their potential readers as an indicator of the media bias.
1 code implementation • 23 May 2023 • Niyati Bafna, Cristina España-Bonet, Josef van Genabith, Benoît Sagot, Rachel Bawden
Most existing approaches for unsupervised bilingual lexicon induction (BLI) depend on good quality static or contextual embeddings requiring large monolingual corpora for both languages.
no code implementations • 24 Oct 2022 • Kwabena Amponsah-Kaakyire, Daria Pylypenko, Josef van Genabith, Cristina España-Bonet
Previous research did not show $(i)$ whether the difference is because of the features, the classifiers or both, and $(ii)$ what the neural classifiers actually learn.
1 code implementation • NAACL (SocialNLP) 2022 • Dana Ruiter, Thomas Kleinbauer, Cristina España-Bonet, Josef van Genabith, Dietrich Klakow
Recent research on style transfer takes inspiration from unsupervised neural machine translation (UNMT), learning from large amounts of non-parallel data by exploiting cycle consistency loss, back-translation, and denoising autoencoders.
1 code implementation • NAACL 2022 • Koel Dutta Chowdhury, Rricha Jalota, Cristina España-Bonet, Josef van Genabith
Cross-lingual natural language processing relies on translation, either by humans or machines, at different levels, from translating training data to translating test sets.
no code implementations • EMNLP 2021 • Daria Pylypenko, Kwabena Amponsah-Kaakyire, Koel Dutta Chowdhury, Josef van Genabith, Cristina España-Bonet
Traditional hand-crafted linguistically-informed features have often been used for distinguishing between translated and original non-translated texts.
no code implementations • MTSummit 2021 • Dana Ruiter, Dietrich Klakow, Josef van Genabith, Cristina España-Bonet
For most language combinations, parallel data is either scarce or simply unavailable.
1 code implementation • 15 Mar 2021 • David I. Adelani, Dana Ruiter, Jesujoba O. Alabi, Damilola Adebonojo, Adesina Ayeni, Mofe Adeyemi, Ayodele Awokoya, Cristina España-Bonet
We investigate how and when this training condition affects the final quality and intelligibility of a translation.
1 code implementation • 3 May 2020 • Cristina España-Bonet, Alberto Barrón-Cedeño, Lluís Màrquez
Our best metric for domainness shows a strong correlation with the human-judged precision, representing a reasonable automatic alternative to assess the quality of domain-specific corpora.
no code implementations • EMNLP 2020 • Dana Ruiter, Josef van Genabith, Cristina España-Bonet
Self-supervised neural machine translation (SSNMT) jointly learns to identify and select suitable training data from comparable (rather than parallel) corpora and to translate, in a way that the two tasks support each other in a virtuous circle.
1 code implementation • LREC 2020 • Marta R. Costa-jussà, Pau Li Lin, Cristina España-Bonet
We introduce GeBioToolkit, a tool for extracting multilingual parallel corpora at sentence level, with document and gender information from Wikipedia biographies.
1 code implementation • 5 Dec 2019 • Jesujoba O. Alabi, Kwabena Amponsah-Kaakyire, David I. Adelani, Cristina España-Bonet
In this paper we focus on two African languages, Yor\`ub\'a and Twi, and compare the word embeddings obtained in this way, with word embeddings obtained from curated corpora and a language-dependent processing.
no code implementations • WS 2019 • Ekaterina Lapshinova-Koltunski, Cristina España-Bonet, Josef van Genabith
We analyse coreference phenomena in three neural machine translation systems trained with different data settings with or without access to explicit intra- and cross-sentential anaphoric information.
no code implementations • 25 Sep 2019 • Dana Ruiter, Cristina España-Bonet, Josef van Genabith
Self-supervised neural machine translation (SS-NMT) learns how to extract/select suitable training data from comparable (rather than parallel) corpora and how to translate, in a way that the two tasks support each other in a virtuous circle.
no code implementations • 18 Apr 2017 • Cristina España-Bonet, Ádám Csaba Varga, Alberto Barrón-Cedeño, Josef van Genabith
First, we systematically study the NMT context vectors, i. e. output of the encoder, and their power as an interlingua representation of a sentence.
no code implementations • 5 Aug 2016 • Pranava Swaroop Madhyastha, Cristina España-Bonet
Out-of-vocabulary words account for a large proportion of errors in machine translation systems, especially when the system is used on a different domain than the one where it was trained.