no code implementations • EAMT 2020 • Víctor M. Sánchez-Cartagena, Mikel L. Forcada, Felipe Sánchez-Martínez
Corpus-based approaches to machine translation (MT) have difficulties when the amount of parallel corpora to use for training is scarce, especially if the languages involved in the translation are highly inflected.
no code implementations • EAMT 2020 • Felipe Sánchez-Martínez, Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Mikel L. Forcada, Miquel Esplà-Gomis, Andrew Secker, Susie Coleman, Julie Wall
This paper describes our approach to create a neural machine translation system to translate between English and Swahili (both directions) in the news domain, as well as the process we followed to crawl the necessary parallel corpora from the Internet.
no code implementations • WMT (EMNLP) 2020 • Miquel Esplà-Gomis, Víctor M. Sánchez-Cartagena, Jaume Zaragoza-Bernabeu, Felipe Sánchez-Martínez
This paper describes the joint submission of Universitat d’Alacant and Prompsit Language Engineering to the WMT 2020 shared task on parallel corpus filtering.
no code implementations • TRITON 2021 • Gema Ramírez-Sánchez, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Caroline Rossi, Dorothy Kenny, Riccardo Superbo, Pilar Sánchez-Gijón, Olga Torres-Hostench
The MultiTraiNMT Erasmus+ project aims at developing an open innovative syllabus in neural machine translation (NMT) for language learners and translators as multilingual citizens.
no code implementations • AMTA 2016 • John Ortega, Felipe Sánchez-Martínez, Mikel Forcada
Computer-aided translation (CAT) tools often use a translation memory (TM) as the key resource to assist translators.
no code implementations • EAMT 2022 • Mikel L. Forcada, Pilar Sánchez-Gijón, Dorothy Kenny, Felipe Sánchez-Martínez, Juan Antonio Pérez Ortiz, Riccardo Superbo, Gema Ramírez Sánchez, Olga Torres-Hostench, Caroline Rossi
The MultitraiNMT Erasmus+ project has developed an open innovative syl-labus in machine translation, focusing on neural machine translation (NMT) and targeting both language learners and translators.
no code implementations • EAMT 2022 • Peggy van der Kreeft, Alexandra Birch, Sevi Sariisik, Felipe Sánchez-Martínez, Wilker Aziz
The GoURMET project, funded by the European Commission’s H2020 program (under grant agreement 825299), develops models for machine translation, in particular for low-resourced languages.
no code implementations • 23 May 2024 • Cristian García-Romero, Miquel Esplà-Gomis, Felipe Sánchez-Martínez
Crawling parallel texts $\unicode{x2014}$texts that are mutual translations$\unicode{x2014}$ from the Internet is usually done following a brute-force approach: documents are massively downloaded in an unguided process, and only a fraction of them end up leading to actual parallel content.
1 code implementation • 11 Apr 2024 • Andrés Lou, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Víctor M. Sánchez-Cartagena
The Mayan languages comprise a language family with an ancient history, millions of speakers, and immense cultural value, that, nevertheless, remains severely underrepresented in terms of resources and global exposure.
1 code implementation • 29 Jan 2024 • Víctor M. Sánchez-Cartagena, Miquel Esplà-Gomis, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez
When the amount of parallel sentences available to train a neural machine translation is scarce, a common practice is to generate new synthetic training samples from them.
no code implementations • 29 Jan 2024 • Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez
The study covers eight language pairs, different training corpus sizes, two architectures, and three types of annotation: dummy tags (with no linguistic information at all), part-of-speech tags, and morpho-syntactic description tags, which consist of part of speech and morphological features.
1 code implementation • 16 Jan 2024 • Miquel Esplà-Gomis, Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez
The paper presents an automatic evaluation of these techniques on four language pairs that shows that our approach can successfully exploit monolingual texts in a TM-based CAT environment, increasing the amount of useful translation proposals, and that our neural model for estimating the post-editing effort enables the combination of translation proposals obtained from monolingual corpora and from TMs in the usual way.
1 code implementation • EMNLP 2021 • Víctor M. Sánchez-Cartagena, Miquel Esplà-Gomis, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez
Many DA approaches aim at expanding the support of the empirical data distribution by generating new sentence pairs that contain infrequent words, thus making it closer to the true data distribution of parallel sentences.
no code implementations • 3 Apr 2020 • Felipe Sánchez-Martínez, Juan Antonio Pérez-Ortiz, Rafael C. Carrasco
Translation models based on hierarchical phrase-based statistical machine translation (HSMT) have shown better performances than the non-hierarchical phrase-based counterparts for some language pairs.
no code implementations • WS 2018 • Miquel Esplà-Gomis, Felipe Sánchez-Martínez, Mikel L. Forcada
We describe the Universitat d'Alacant submissions to the word- and sentence-level machine translation (MT) quality estimation (QE) shared task at WMT 2018.
no code implementations • 18 Jan 2014 • Felipe Sánchez-Martínez, Rafael C. Carrasco, Miguel A. Martínez-Prieto, Joaquin Adiego
For example, a bitext can be seen as a sequence of biwords ---pairs of parallel words with a high probability of co-occurrence--- that can be used as an intermediate representation in the compression process.
no code implementations • 15 Jan 2014 • Felipe Sánchez-Martínez, Mikel L. Forcada
This paper describes a method for the automatic inference of structural transfer rules to be used in a shallow-transfer machine translation (MT) system from small parallel corpora.
no code implementations • 16 Jun 2013 • Felipe Sánchez-Martínez, Isabel Martínez-Sempere, Xavier Ivars-Ribes, Rafael C. Carrasco
The IMPACT-es diachronic corpus of historical Spanish compiles over one hundred books --containing approximately 8 million words-- in addition to a complementary lexicon which links more than 10 thousand lemmas with attestations of the different variants found in the documents.