Search Results for author: Mikel Artetxe

Found 51 papers, 23 papers with code

PARADISE”:" Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining

no code implementations RepL4NLP (ACL) 2022 Machel Reid, Mikel Artetxe

Despite the success of multilingual sequence-to-sequence pretraining, most existing approaches rely on monolingual corpora and do not make use of the strong cross-lingual signal contained in parallel data.

Cross-Lingual Natural Language Inference Denoising +2

Gender-specific Machine Translation with Large Language Models

no code implementations6 Sep 2023 Eduardo Sánchez, Pierre Andrews, Pontus Stenetorp, Mikel Artetxe, Marta R. Costa-jussà

Decoder-only Large Language Models (LLMs) have demonstrated potential in machine translation (MT), albeit with performance slightly lagging behind traditional encoder-decoder Neural Machine Translation (NMT) systems.

Machine Translation NMT +1

Evaluation of Faithfulness Using the Longest Supported Subsequence

no code implementations23 Aug 2023 Anirudh Mittal, Timo Schick, Mikel Artetxe, Jane Dwivedi-Yu

Our proposed metric demonstrates an 18% enhancement over the prevailing state-of-the-art metric for faithfulness on our dataset.

Question Answering

Do Multilingual Language Models Think Better in English?

1 code implementation2 Aug 2023 Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lopez de Lacalle, Mikel Artetxe

In this work, we introduce a new approach called self-translate, which overcomes the need of an external translation system by leveraging the few-shot translation capabilities of multilingual language models.

Common Sense Reasoning Cross-Lingual Natural Language Inference +6

Revisiting Machine Translation for Cross-lingual Classification

no code implementations23 May 2023 Mikel Artetxe, Vedanuj Goswami, Shruti Bhosale, Angela Fan, Luke Zettlemoyer

Machine Translation (MT) has been widely used for cross-lingual classification, either by translating the test set into English and running inference with a monolingual model (translate-test), or translating the training set into the target languages and finetuning a multilingual model (translate-train).

Classification Cross-Lingual Transfer +2

CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models

no code implementations23 May 2023 Aitor Ormazabal, Mikel Artetxe, Eneko Agirre

Methods for adapting language models (LMs) to new tasks and domains have traditionally assumed white-box access to the model, and work by modifying its parameters.

Machine Translation

On the Role of Parallel Data in Cross-lingual Transfer Learning

no code implementations20 Dec 2022 Machel Reid, Mikel Artetxe

While prior work has established that the use of parallel data is conducive for cross-lingual learning, it is unclear if the improvements come from the data itself, or if it is the modeling of parallel interactions that matters.

Cross-Lingual Transfer Transfer Learning +2

Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

no code implementations20 Dec 2022 Kelly Marchisio, Patrick Lewis, Yihong Chen, Mikel Artetxe

Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen.

Cross-Lingual Transfer

Don't Prompt, Search! Mining-based Zero-Shot Learning with Language Models

no code implementations26 Oct 2022 Mozes van de Kar, Mengzhou Xia, Danqi Chen, Mikel Artetxe

Our results suggest that the success of prompting can partly be explained by the model being exposed to similar examples during pretraining, which can be directly retrieved through regular expressions.

Text Classification Text Infilling +2

Prompting ELECTRA: Few-Shot Learning with Discriminative Pre-Trained Models

1 code implementation30 May 2022 Mengzhou Xia, Mikel Artetxe, Jingfei Du, Danqi Chen, Ves Stoyanov

In this work, we adapt prompt-based few-shot learning to ELECTRA and show that it outperforms masked language models in a wide range of tasks.

Few-Shot Learning Text Infilling

Principled Paraphrase Generation with Parallel Corpora

1 code implementation ACL 2022 Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre

Round-trip Machine Translation (MT) is a popular choice for paraphrase generation, which leverages readily available parallel corpora for supervision.

Machine Translation Paraphrase Generation +1

On the Role of Bidirectionality in Language Model Pre-Training

no code implementations24 May 2022 Mikel Artetxe, Jingfei Du, Naman Goyal, Luke Zettlemoyer, Ves Stoyanov

Prior work on language model pre-training has explored different architectures and learning objectives, but differences in data, hyperparameters and evaluation make a principled comparison difficult.

Language Modelling Text Infilling

PoeLM: A Meter- and Rhyme-Controllable Language Model for Unsupervised Poetry Generation

1 code implementation24 May 2022 Aitor Ormazabal, Mikel Artetxe, Manex Agirrezabal, Aitor Soroa, Eneko Agirre

During inference, we build control codes for the desired meter and rhyme scheme, and condition our language model on them to generate formal verse poetry.

Language Modelling

Multilingual Machine Translation with Hyper-Adapters

3 code implementations22 May 2022 Christos Baziotis, Mikel Artetxe, James Cross, Shruti Bhosale

We find that hyper-adapters are more parameter efficient than regular adapters, reaching the same performance with up to 12 times less parameters.

Machine Translation Translation

Lifting the Curse of Multilinguality by Pre-training Modular Transformers

no code implementations NAACL 2022 Jonas Pfeiffer, Naman Goyal, Xi Victoria Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe

Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages.

named-entity-recognition Named Entity Recognition +3

Does Corpus Quality Really Matter for Low-Resource Languages?

no code implementations15 Mar 2022 Mikel Artetxe, Itziar Aldabe, Rodrigo Agerri, Olatz Perez-de-Viñaspre, Aitor Soroa

For instance, 66% of documents are rated as high-quality for EusCrawl, in contrast with <33% for both mC4 and CC100.

Representation Learning

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

1 code implementation25 Feb 2022 Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer

Large language models (LMs) are able to in-context learn -- perform a new task via inference alone by conditioning on a few input-label pairs (demonstrations) and making predictions for new inputs.

Efficient Large Scale Language Modeling with Mixtures of Experts

no code implementations20 Dec 2021 Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov

This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning.

Language Modelling

PARADISE: Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining

1 code implementation NAACL 2022 Machel Reid, Mikel Artetxe

Despite the success of multilingual sequence-to-sequence pretraining, most existing approaches rely on monolingual corpora, and do not make use of the strong cross-lingual signal contained in parallel data.

Cross-Lingual Natural Language Inference Denoising +2

Multilingual Autoregressive Entity Linking

1 code implementation23 Mar 2021 Nicola De Cao, Ledell Wu, Kashyap Popat, Mikel Artetxe, Naman Goyal, Mikhail Plekhanov, Luke Zettlemoyer, Nicola Cancedda, Sebastian Riedel, Fabio Petroni

Moreover, in a zero-shot setting on languages with no training data at all, mGENRE treats the target language as a latent variable that is marginalized at prediction time.

Ranked #2 on Entity Disambiguation on Mewsli-9 (using extra training data)

Entity Disambiguation Entity Linking

Training Multilingual Machine Translation by Alternately Freezing Language-Specific Encoders-Decoders

no code implementations29 May 2020 Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa, Mikel Artetxe

We propose a modular architecture of language-specific encoder-decoders that constitutes a multilingual machine translation system that can be incrementally extended to new languages without the need for retraining the existing system when adding new languages.

Machine Translation Natural Language Inference +1

A Call for More Rigor in Unsupervised Cross-lingual Learning

no code implementations ACL 2020 Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre

We review motivations, definition, approaches, and methodology for unsupervised cross-lingual learning and call for a more rigorous position in each of them.

Cross-Lingual Word Embeddings Translation +2

Translation Artifacts in Cross-lingual Transfer Learning

1 code implementation EMNLP 2020 Mikel Artetxe, Gorka Labaka, Eneko Agirre

Both human and machine translation play a central role in cross-lingual transfer learning: many multilingual datasets have been created through professional translation services, and using machine translation to translate either the test set or the training set is a widely used transfer technique.

Cross-Lingual Transfer Machine Translation +3

On the Cross-lingual Transferability of Monolingual Representations

6 code implementations ACL 2020 Mikel Artetxe, Sebastian Ruder, Dani Yogatama

This generalization ability has been attributed to the use of a shared subword vocabulary and joint training across multiple languages giving rise to deep multilingual abstractions.

Cross-Lingual Question Answering Language Modelling +1

Contextualized Translations of Phrasal Verbs with Distributional Compositional Semantics and Monolingual Corpora

no code implementations CL 2019 Pablo Gamallo, Susana Sotelo, Jos{\'e} Ramom Pichel, Mikel Artetxe

The contextualization of meaning is carried out by means of distributional composition within a structured vector space with syntactic dependencies, and the bilingual space is created by means of transfer rules and a bilingual dictionary.

Translation Word Translation

Bilingual Lexicon Induction through Unsupervised Machine Translation

1 code implementation ACL 2019 Mikel Artetxe, Gorka Labaka, Eneko Agirre

A recent research line has obtained strong results on bilingual lexicon induction by aligning independently trained word embeddings in two languages and using the resulting cross-lingual embeddings to induce word translation pairs through nearest neighbor or related retrieval methods.

Bilingual Lexicon Induction Language Modelling +6

Analyzing the Limitations of Cross-lingual Word Embedding Mappings

no code implementations ACL 2019 Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa, Eneko Agirre

Recent research in cross-lingual word embeddings has almost exclusively focused on offline methods, which independently train word embeddings in different languages and map them to a shared space through linear transformations.

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +1

An Effective Approach to Unsupervised Machine Translation

1 code implementation ACL 2019 Mikel Artetxe, Gorka Labaka, Eneko Agirre

While machine translation has traditionally relied on large amounts of parallel corpora, a recent research line has managed to train both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) systems using monolingual corpora only.

NMT Translation +1

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

13 code implementations TACL 2019 Mikel Artetxe, Holger Schwenk

We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts.

Cross-Lingual Bitext Mining Cross-Lingual Document Classification +5

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings

9 code implementations ACL 2019 Mikel Artetxe, Holger Schwenk

Machine translation is highly sensitive to the size and quality of the training data, which has led to an increasing interest in collecting and filtering large parallel corpora.

Cross-Lingual Bitext Mining Machine Translation +4

Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation

2 code implementations CONLL 2018 Mikel Artetxe, Gorka Labaka, Iñigo Lopez-Gazpio, Eneko Agirre

Following the recent success of word embeddings, it has been argued that there is no such thing as an ideal representation for words, as different models tend to capture divergent and often mutually incompatible aspects like semantics/syntax and similarity/relatedness.

Word Embeddings

Unsupervised Statistical Machine Translation

3 code implementations EMNLP 2018 Mikel Artetxe, Gorka Labaka, Eneko Agirre

While modern machine translation has relied on large parallel corpora, a recent line of work has managed to train Neural Machine Translation (NMT) systems from monolingual corpora only (Artetxe et al., 2018c; Lample et al., 2018).

Language Modelling NMT +2

A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings

1 code implementation ACL 2018 Mikel Artetxe, Gorka Labaka, Eneko Agirre

Recent work has managed to learn cross-lingual word embeddings without parallel data by mapping monolingual embeddings to a shared space through adversarial training.

Cross-Lingual Word Embeddings Self-Learning +1

Unsupervised Neural Machine Translation

2 code implementations ICLR 2018 Mikel Artetxe, Gorka Labaka, Eneko Agirre, Kyunghyun Cho

In spite of the recent success of neural machine translation (NMT) in standard benchmarks, the lack of large parallel corpora poses a major practical problem for many language pairs.

NMT Translation +1

Learning bilingual word embeddings with (almost) no bilingual data

no code implementations ACL 2017 Mikel Artetxe, Gorka Labaka, Eneko Agirre

Most methods to learn bilingual word embeddings rely on large parallel corpora, which is difficult to obtain for most language pairs.

Document Classification Entity Linking +5

Cannot find the paper you are looking for? You can Submit a new open access paper.