Search Results for author: Felipe Sánchez-Martínez

Found 18 papers, 4 papers with code

A multi-source approach for Breton–French hybrid machine translation

no code implementations EAMT 2020 Víctor M. Sánchez-Cartagena, Mikel L. Forcada, Felipe Sánchez-Martínez

Corpus-based approaches to machine translation (MT) have difficulties when the amount of parallel corpora to use for training is scarce, especially if the languages involved in the translation are highly inflected.

Data Augmentation Machine Translation +2

An English-Swahili parallel corpus and its use for neural machine translation in the news domain

no code implementations EAMT 2020 Felipe Sánchez-Martínez, Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Mikel L. Forcada, Miquel Esplà-Gomis, Andrew Secker, Susie Coleman, Julie Wall

This paper describes our approach to create a neural machine translation system to translate between English and Swahili (both directions) in the news domain, as well as the process we followed to crawl the necessary parallel corpora from the Internet.

Machine Translation Translation

GoURMET – Machine Translation for Low-Resourced Languages

no code implementations EAMT 2022 Peggy van der Kreeft, Alexandra Birch, Sevi Sariisik, Felipe Sánchez-Martínez, Wilker Aziz

The GoURMET project, funded by the European Commission’s H2020 program (under grant agreement 825299), develops models for machine translation, in particular for low-resourced languages.

Machine Translation Translation

Curated Datasets and Neural Models for Machine Translation of Informal Registers between Mayan and Spanish Vernaculars

2 code implementations11 Apr 2024 Andrés Lou, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Víctor M. Sánchez-Cartagena

The Mayan languages comprise a language family with an ancient history, millions of speakers, and immense cultural value, that, nevertheless, remains severely underrepresented in terms of resources and global exposure.

Machine Translation Translation

Non-Fluent Synthetic Target-Language Data Improve Neural Machine Translation

1 code implementation29 Jan 2024 Víctor M. Sánchez-Cartagena, Miquel Esplà-Gomis, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez

When the amount of parallel sentences available to train a neural machine translation is scarce, a common practice is to generate new synthetic training samples from them.

Machine Translation Translation

Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation

no code implementations29 Jan 2024 Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez

The study covers eight language pairs, different training corpus sizes, two architectures, and three types of annotation: dummy tags (with no linguistic information at all), part-of-speech tags, and morpho-syntactic description tags, which consist of part of speech and morphological features.

Machine Translation TAG

Cross-lingual neural fuzzy matching for exploiting target-language monolingual corpora in computer-aided translation

1 code implementation16 Jan 2024 Miquel Esplà-Gomis, Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez

The paper presents an automatic evaluation of these techniques on four language pairs that shows that our approach can successfully exploit monolingual texts in a TM-based CAT environment, increasing the amount of useful translation proposals, and that our neural model for estimating the post-editing effort enables the combination of translation proposals obtained from monolingual corpora and from TMs in the usual way.

Sentence Sentence Embeddings +1

Rethinking Data Augmentation for Low-Resource Neural Machine Translation: A Multi-Task Learning Approach

1 code implementation EMNLP 2021 Víctor M. Sánchez-Cartagena, Miquel Esplà-Gomis, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez

Many DA approaches aim at expanding the support of the empirical data distribution by generating new sentence pairs that contain infrequent words, thus making it closer to the true data distribution of parallel sentences.

Data Augmentation Low-Resource Neural Machine Translation +3

Learning synchronous context-free grammars with multiple specialised non-terminals for hierarchical phrase-based translation

no code implementations3 Apr 2020 Felipe Sánchez-Martínez, Juan Antonio Pérez-Ortiz, Rafael C. Carrasco

Translation models based on hierarchical phrase-based statistical machine translation (HSMT) have shown better performances than the non-hierarchical phrase-based counterparts for some language pairs.

Clustering Machine Translation +1

Generalized Biwords for Bitext Compression and Translation Spotting

no code implementations18 Jan 2014 Felipe Sánchez-Martínez, Rafael C. Carrasco, Miguel A. Martínez-Prieto, Joaquin Adiego

For example, a bitext can be seen as a sequence of biwords ---pairs of parallel words with a high probability of co-occurrence--- that can be used as an intermediate representation in the compression process.

Translation

Inferring Shallow-Transfer Machine Translation Rules from Small Parallel Corpora

no code implementations15 Jan 2014 Felipe Sánchez-Martínez, Mikel L. Forcada

This paper describes a method for the automatic inference of structural transfer rules to be used in a shallow-transfer machine translation (MT) system from small parallel corpora.

Machine Translation Sentence +2

An open diachronic corpus of historical Spanish: annotation criteria and automatic modernisation of spelling

no code implementations16 Jun 2013 Felipe Sánchez-Martínez, Isabel Martínez-Sempere, Xavier Ivars-Ribes, Rafael C. Carrasco

The IMPACT-es diachronic corpus of historical Spanish compiles over one hundred books --containing approximately 8 million words-- in addition to a complementary lexicon which links more than 10 thousand lemmas with attestations of the different variants found in the documents.

LEMMA Machine Translation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.