no code implementations • EAMT 2020 • Víctor M. Sánchez-Cartagena, Mikel L. Forcada, Felipe Sánchez-Martínez
Corpus-based approaches to machine translation (MT) have difficulties when the amount of parallel corpora to use for training is scarce, especially if the languages involved in the translation are highly inflected.
no code implementations • EAMT 2022 • Marta Bañón, Miquel Esplà-Gomis, Mikel L. Forcada, Cristian García-Romero, Taja Kuzman, Nikola Ljubešić, Rik van Noord, Leopoldo Pla Sempere, Gema Ramírez-Sánchez, Peter Rupnik, Vít Suchomel, Antonio Toral, Tobias van der Werff, Jaume Zaragoza
We introduce the project “MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages”, funded by the Connecting Europe Facility, which is aimed at building monolingual and parallel corpora for under-resourced European languages.
no code implementations • EAMT 2022 • Mikel L. Forcada, Pilar Sánchez-Gijón, Dorothy Kenny, Felipe Sánchez-Martínez, Juan Antonio Pérez Ortiz, Riccardo Superbo, Gema Ramírez Sánchez, Olga Torres-Hostench, Caroline Rossi
The MultitraiNMT Erasmus+ project has developed an open innovative syl-labus in machine translation, focusing on neural machine translation (NMT) and targeting both language learners and translators.
no code implementations • EAMT 2020 • Felipe Sánchez-Martínez, Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Mikel L. Forcada, Miquel Esplà-Gomis, Andrew Secker, Susie Coleman, Julie Wall
This paper describes our approach to create a neural machine translation system to translate between English and Swahili (both directions) in the news domain, as well as the process we followed to crawl the necessary parallel corpora from the Internet.
2 code implementations • ACL 2020 • Marta Ba{\~n}{\'o}n, Pin-zhen Chen, Barry Haddow, Kenneth Heafield, Hieu Hoang, Miquel Espl{\`a}-Gomis, Mikel L. Forcada, Amir Kamran, Faheem Kirefu, Philipp Koehn, Sergio Ortiz Rojas, Leopoldo Pla Sempere, Gema Ram{\'\i}rez-S{\'a}nchez, Elsa Sarr{\'\i}as, Marek Strelec, Brian Thompson, William Waites, Dion Wiggins, Jaume Zaragoza
We report on methods to create the largest publicly available parallel corpora by crawling the web, using open source software.
1 code implementation • EMNLP (IWSLT) 2019 • Carolina Scarton, Mikel L. Forcada, Miquel Esplà-Gomis, Lucia Specia
To that end, we report experiments on a dataset with newly-collected post-editing indicators and show their usefulness when estimating post-editing effort.
no code implementations • WS 2019 • Alex Birch, ra, Barry Haddow, Ivan Tito, Antonio Valerio Miceli Barone, Rachel Bawden, Felipe S{\'a}nchez-Mart{\'\i}nez, Mikel L. Forcada, Miquel Espl{\`a}-Gomis, V{\'\i}ctor S{\'a}nchez-Cartagena, Juan Antonio P{\'e}rez-Ortiz, Wilker Aziz, Andrew Secker, Peggy van der Kreeft
no code implementations • WS 2018 • Miquel Esplà-Gomis, Felipe Sánchez-Martínez, Mikel L. Forcada
We describe the Universitat d'Alacant submissions to the word- and sentence-level machine translation (MT) quality estimation (QE) shared task at WMT 2018.
no code implementations • WS 2018 • Philipp Koehn, Huda Khayrallah, Kenneth Heafield, Mikel L. Forcada
We posed the shared task of assigning sentence-level quality scores for a very noisy corpus of sentence pairs crawled from the web, with the goal of sub-selecting 1{\%} and 10{\%} of high-quality data to be used to train machine translation systems.
no code implementations • WS 2018 • Mikel L. Forcada, Carolina Scarton, Lucia Specia, Barry Haddow, Alexandra Birch
A popular application of machine translation (MT) is gisting: MT is consumed as is to make sense of text in a foreign language.
no code implementations • 7 Jul 2016 • Núria Bel, Mikel L. Forcada, Asunción Gómez-Pérez
Any public administration that produces translation data can be a provider of useful reusable data to meet its own translation needs and the ones of other public organizations and private companies that work with texts of the same domain.
no code implementations • 18 Sep 2015 • Gang Chen, Mikel L. Forcada
This paper describes a free/open-source implementation of the light sliding-window (LSW) part-of-speech tagger for the Apertium free/open-source machine translation platform.
no code implementations • EAMT 2016 • Antonio Toral, Tommi A. Pirinen, Andy Way, Gema Ram{\'\i}rez-S{\'a}nchez, Sergio Ortiz Rojas, Raphael Rubino, Miquel Espl{\`a}, Mikel L. Forcada, Vassilis Papavassiliou, Prokopis Prokopidis, Nikola Ljube{\v{s}}i{\'c}
no code implementations • 15 Jan 2014 • Felipe Sánchez-Martínez, Mikel L. Forcada
This paper describes a method for the automatic inference of structural transfer rules to be used in a shallow-transfer machine translation (MT) system from small parallel corpora.