Search Results for author: Marta R. Costa-jussà

Found 68 papers, 26 papers with code

The IPN-CIC team system submission for the WMT 2020 similar language task

no code implementations WMT (EMNLP) 2020 Luis A. Menéndez-Salazar, Grigori Sidorov, Marta R. Costa-jussà

This paper describes the participation of the NLP research team of the IPN Computer Research center in the WMT 2020 Similar Language Translation Task.

Domain Adaptation Translation

E-Commerce Content and Collaborative-based Recommendation using K-Nearest Neighbors and Enriched Weighted Vectors

no code implementations EcomNLP (COLING) 2020 Bardia Rafieian, Marta R. Costa-jussà

In this paper, we present two productive and functional recommender methods to improve the ac- curacy of predicting the right product for the user.

Recommendation Systems

Pretrained Speech Encoders and Efficient Fine-tuning Methods for Speech Translation: UPC at IWSLT 2022

1 code implementation IWSLT (ACL) 2022 Ioannis Tsiamas, Gerard I. Gállego, Carlos Escolano, José Fonollosa, Marta R. Costa-jussà

We further investigate the suitability of different speech encoders (wav2vec 2. 0, HuBERT) for our models and the impact of knowledge distillation from the Machine Translation model that we use for the decoder (mBART).

Knowledge Distillation Machine Translation +2

Pushing the Limits of Zero-shot End-to-End Speech Translation

1 code implementation16 Feb 2024 Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà

The speech encoder seamlessly integrates with the MT model at inference, enabling direct translation from speech to text, across all languages supported by the MT model.

Speech-to-Text Translation Translation

MuTox: Universal MUltilingual Audio-based TOXicity Dataset and Zero-shot Detector

1 code implementation10 Jan 2024 Marta R. Costa-jussà, Mariano Coria Meglioli, Pierre Andrews, David Dale, Prangthip Hansanti, Elahe Kalbassi, Alex Mourachko, Christophe Ropers, Carleigh Wood

Research in toxicity detection in natural language processing for the speech modality (audio-based) is quite limited, particularly for languages other than English.

SpeechAlign: a Framework for Speech Translation Alignment Evaluation

no code implementations20 Sep 2023 Belen Alastruey, Aleix Sant, Gerard I. Gállego, David Dale, Marta R. Costa-jussà

To contribute to these fields, we present SpeechAlign, a framework to evaluate the underexplored field of source-target alignment in speech models.

Speech-to-Text Translation Translation

Gender-specific Machine Translation with Large Language Models

no code implementations6 Sep 2023 Eduardo Sánchez, Pierre Andrews, Pontus Stenetorp, Mikel Artetxe, Marta R. Costa-jussà

Decoder-only Large Language Models (LLMs) have demonstrated potential in machine translation (MT), albeit with performance slightly lagging behind traditional encoder-decoder Neural Machine Translation (NMT) systems.

In-Context Learning Machine Translation +2

ReSeTOX: Re-learning attention weights for toxicity mitigation in machine translation

1 code implementation19 May 2023 Javier García Gilabert, Carlos Escolano, Marta R. Costa-jussà

Our proposed method, ReSeTOX (REdo SEarch if TOXic), addresses the issue of Neural Machine Translation (NMT) generating translation outputs that contain toxic words not present in the input.

Machine Translation NMT +1

Efficient Speech Translation with Dynamic Latent Perceivers

1 code implementation28 Oct 2022 Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà

Transformers have been the dominant architecture for Speech Translation in recent years, achieving significant improvements in translation quality.

Speech-to-Text Translation Translation

Toxicity in Multilingual Machine Translation at Scale

no code implementations6 Oct 2022 Marta R. Costa-jussà, Eric Smith, Christophe Ropers, Daniel Licht, Jean Maillard, Javier Ferrando, Carlos Escolano

We evaluate and analyze added toxicity when translating a large evaluation dataset (HOLISTICBIAS, over 472k sentences, covering 13 demographic axes) from English into 164 languages.

Hallucination Machine Translation +1

Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer

1 code implementation23 May 2022 Javier Ferrando, Gerard I. Gállego, Belen Alastruey, Carlos Escolano, Marta R. Costa-jussà

In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and the target prefix (what has been previously translated at a decoding step).

Machine Translation NMT +2

Measuring the Mixing of Contextual Information in the Transformer

2 code implementations8 Mar 2022 Javier Ferrando, Gerard I. Gállego, Marta R. Costa-jussà

The Transformer architecture aggregates input information through the self-attention mechanism, but there is no clear understanding of how this information is mixed across the entire model.

A multi-task semi-supervised framework for Text2Graph & Graph2Text

1 code implementation12 Feb 2022 Oriol Domingo, Marta R. Costa-jussà, Carlos Escolano

The proposed solution, a T5 architecture, is trained in a multi-task semi-supervised environment, with our collected non-parallel data, following a cycle training regime.

Information Retrieval Retrieval +1

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

2 code implementations9 Feb 2022 Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà

Speech translation datasets provide manual segmentations of the audios, which are not available in real-world scenarios, and existing segmentation methods usually significantly reduce translation quality at inference time.

Segmentation Speech-to-Text Translation +1

Efficient Transformer for Direct Speech Translation

no code implementations7 Jul 2021 Belen Alastruey, Gerard I. Gállego, Marta R. Costa-jussà

When working with speech, we must face a problem: the sequence length of an audio input is not suitable for the Transformer.

Translation

End-to-End Speech Translation with Pre-trained Models and Adapters: UPC at IWSLT 2021

1 code implementation ACL (IWSLT) 2021 Gerard I. Gállego, Ioannis Tsiamas, Carlos Escolano, José A. R. Fonollosa, Marta R. Costa-jussà

Our submission also uses a custom segmentation algorithm that employs pre-trained Wav2Vec 2. 0 for identifying periods of untranscribable text and can bring improvements of 2. 5 to 3 BLEU score on the IWSLT 2019 test set, as compared to the result with the given segmentation.

Ranked #2 on Speech-to-Text Translation on MuST-C EN->DE (using extra training data)

Segmentation Speech-to-Text Translation +1

Impact of Gender Debiased Word Embeddings in Language Modeling

no code implementations3 May 2021 Christine Basta, Marta R. Costa-jussà

Gender, race and social biases have recently been detected as evident examples of unfairness in applications of Natural Language Processing.

Fairness Language Modelling +1

How to Write a Bias Statement: Recommendations for Submissions to the Workshop on Gender Bias in NLP

no code implementations7 Apr 2021 Christian Hardmeier, Marta R. Costa-jussà, Kellie Webster, Will Radford, Su Lin Blodgett

At the Workshop on Gender Bias in NLP (GeBNLP), we'd like to encourage authors to give explicit consideration to the wider aspects of bias and its social implications.

Sparsely Factored Neural Machine Translation

1 code implementation17 Feb 2021 Noe Casas, Jose A. R. Fonollosa, Marta R. Costa-jussà

The standard approach to incorporate linguistic information to neural machine translation systems consists in maintaining separate vocabularies for each of the annotated features to be incorporated (e. g. POS tags, dependency relation label), embed them, and then aggregate them with each subword in the word they belong to.

Machine Translation POS +1

Continual Lifelong Learning in Natural Language Processing: A Survey

no code implementations COLING 2020 Magdalena Biesialska, Katarzyna Biesialska, Marta R. Costa-jussà

However, it is difficult for existing deep learning architectures to learn a new task without largely forgetting previously acquired knowledge.

Continual Learning

Evaluating Gender Bias in Speech Translation

no code implementations LREC 2022 Marta R. Costa-jussà, Christine Basta, Gerard I. Gállego

WinoST is the speech version of WinoMT which is a MT challenge set and both follow an evaluation protocol to measure gender accuracy.

Translation

Training Multilingual Machine Translation by Alternately Freezing Language-Specific Encoders-Decoders

no code implementations29 May 2020 Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa, Mikel Artetxe

We propose a modular architecture of language-specific encoder-decoders that constitutes a multilingual machine translation system that can be incrementally extended to new languages without the need for retraining the existing system when adding new languages.

Machine Translation Natural Language Inference +2

MT-Adapted Datasheets for Datasets: Template and Repository

no code implementations27 May 2020 Marta R. Costa-jussà, Roger Creus, Oriol Domingo, Albert Domínguez, Miquel Escobar, Cayetana López, Marina Garcia, Margarita Geleta

In this report we are taking the standardized model proposed by Gebru et al. (2018) for documenting the popular machine translation datasets of the EuroParl (Koehn, 2005) and News-Commentary (Barrault et al., 2019).

Machine Translation Translation

Enriching the Transformer with Linguistic Factors for Low-Resource Machine Translation

no code implementations RANLP 2021 Jordi Armengol-Estapé, Marta R. Costa-jussà, Carlos Escolano

Introducing factors, that is to say, word features such as linguistic information referring to the source tokens, is known to improve the results of neural machine translation systems in certain settings, typically in recurrent architectures.

Machine Translation Translation

Refinement of Unsupervised Cross-Lingual Word Embeddings

1 code implementation21 Feb 2020 Magdalena Biesialska, Marta R. Costa-jussà

In this paper, we propose a self-supervised method to refine the alignment of unsupervised bilingual word embeddings.

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +1

GeBioToolkit: Automatic Extraction of Gender-Balanced Multilingual Corpus of Wikipedia Biographies

1 code implementation LREC 2020 Marta R. Costa-jussà, Pau Li Lin, Cristina España-Bonet

We introduce GeBioToolkit, a tool for extracting multilingual parallel corpora at sentence level, with document and gender information from Wikipedia biographies.

Sentence

The TALP-UPC System for the WMT Similar Language Task: Statistical vs Neural Machine Translation

no code implementations WS 2019 Magdalena Biesialska, Lluis Guardia, Marta R. Costa-jussà

Although the problem of similar language translation has been an area of research interest for many years, yet it is still far from being solved.

Machine Translation Translation

From Bilingual to Multilingual Neural Machine Translation by Incremental Training

no code implementations ACL 2019 Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa

Multilingual Neural Machine Translation approaches are based on the use of task-specific models and the addition of one more language can only be done by retraining the whole system.

Machine Translation Translation

Joint Source-Target Self Attention with Locality Constraints

2 code implementations16 May 2019 José A. R. Fonollosa, Noe Casas, Marta R. Costa-jussà

The dominant neural machine translation models are based on the encoder-decoder structure, and many of them rely on an unconstrained receptive field over source and target sequences.

Language Modelling Machine Translation +1

Towards Interlingua Neural Machine Translation

no code implementations15 May 2019 Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa

By adding and forcing this interlingual loss, we are able to train multiple encoders and decoders for each language, sharing a common intermediate representation.

Machine Translation Translation

Equalizing Gender Biases in Neural Machine Translation with Word Embeddings Techniques

1 code implementation10 Jan 2019 Joel Escudé Font, Marta R. Costa-jussà

We take advantage of the fact that word embeddings are used in neural machine translation to propose a method to equalize gender biases in neural machine translation using these representations.

Fairness Machine Translation +2

(Self-Attentive) Autoencoder-based Universal Language Representation for Machine Translation

no code implementations15 Oct 2018 Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa

Preliminary results on the WMT 2017 Turkish/English task shows that the proposed architecture is capable of learning a universal language representation and simultaneously training both translation directions with state-of-the-art results.

Machine Translation Sentence +1

A Neural Approach to Language Variety Translation

no code implementations COLING 2018 Marta R. Costa-jussà, Marcos Zampieri, Santanu Pal

In this paper we present the first neural-based machine translation system trained to translate between standard national varieties of the same language.

Machine Translation Translation

English-Catalan Neural Machine Translation in the Biomedical Domain through the cascade approach

no code implementations19 Mar 2018 Marta R. Costa-jussà, Noe Casas, Maite Melero

This paper describes the methodology followed to build a neural machine translation system in the biomedical domain for the English-Catalan language pair.

Machine Translation Translation

Morphology Generation for Statistical Machine Translation using Deep Learning Techniques

no code implementations7 Oct 2016 Marta R. Costa-jussà, Carlos Escolano

In this paper, we propose to de-couple machine translation from morphology generation in order to better deal with the problem.

Classification Gender Classification +3

Evaluating Indirect Strategies for Chinese-Spanish Statistical Machine Translation

no code implementations4 Feb 2014 Marta R. Costa-jussà, Carlos A. Henríquez, Rafael E. Banchs

Although, Chinese and Spanish are two of the most spoken languages in the world, not much research has been done in machine translation for this language pair.

Machine Translation Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.