no code implementations • RepL4NLP (ACL) 2022 • Machel Reid, Mikel Artetxe
Despite the success of multilingual sequence-to-sequence pretraining, most existing approaches rely on monolingual corpora and do not make use of the strong cross-lingual signal contained in parallel data.
1 code implementation • 18 Sep 2024 • Eduardo Sánchez, Belen Alastruey, Christophe Ropers, Pontus Stenetorp, Mikel Artetxe, Marta R. Costa-jussà
We propose a new benchmark to measure a language model's linguistic reasoning skills without relying on pre-existing language-specific knowledge.
1 code implementation • 11 Jun 2024 • Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lopez de Lacalle, Mikel Artetxe
To address this gap, we introduce BertaQA, a multiple-choice trivia dataset that is parallel in English and Basque.
1 code implementation • 3 May 2024 • Piotr Padlewski, Max Bain, Matthew Henderson, Zhongkai Zhu, Nishant Relan, Hai Pham, Donovan Ong, Kaloyan Aleksiev, Aitor Ormazabal, Samuel Phua, Ethan Yeo, Eugenie Lamprecht, Qi Liu, Yuqi Wang, Eric Chen, Deyu Fu, Lei LI, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Mikel Artetxe, Yi Tay
We introduce Vibe-Eval: a new open benchmark and framework for evaluating multimodal chat models.
no code implementations • 18 Apr 2024 • Aitor Ormazabal, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Deyu Fu, Donovan Ong, Eric Chen, Eugenie Lamprecht, Hai Pham, Isaac Ong, Kaloyan Aleksiev, Lei LI, Matthew Henderson, Max Bain, Mikel Artetxe, Nishant Relan, Piotr Padlewski, Qi Liu, Ren Chen, Samuel Phua, Yazheng Yang, Yi Tay, Yuqi Wang, Zhongkai Zhu, Zhihui Xie
On text benchmarks, Core not only performs competitively to other frontier models on a set of well-established benchmarks (e. g. MMLU, GSM8K) but also outperforms GPT4-0613 on human evaluation.
1 code implementation • 29 Mar 2024 • Julen Etxaniz, Oscar Sainz, Naiara Perez, Itziar Aldabe, German Rigau, Eneko Agirre, Aitor Ormazabal, Mikel Artetxe, Aitor Soroa
We introduce Latxa, a family of large language models for Basque ranging from 7 to 70 billion parameters.
no code implementations • 6 Sep 2023 • Eduardo Sánchez, Pierre Andrews, Pontus Stenetorp, Mikel Artetxe, Marta R. Costa-jussà
While machine translation (MT) systems have seen significant improvements, it is still common for translations to reflect societal biases, such as gender bias.
2 code implementations • 31 Aug 2023 • Lucas Bandarkar, Davis Liang, Benjamin Muller, Mikel Artetxe, Satya Narayan Shukla, Donald Husa, Naman Goyal, Abhinandan Krishnan, Luke Zettlemoyer, Madian Khabsa
We use this dataset to evaluate the capabilities of multilingual masked language models (MLMs) and large language models (LLMs).
no code implementations • 23 Aug 2023 • Anirudh Mittal, Timo Schick, Mikel Artetxe, Jane Dwivedi-Yu
Our proposed metric demonstrates an 18% enhancement over the prevailing state-of-the-art metric for faithfulness on our dataset.
1 code implementation • 2 Aug 2023 • Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lopez de Lacalle, Mikel Artetxe
In this work, we introduce a new approach called self-translate, which overcomes the need of an external translation system by leveraging the few-shot translation capabilities of multilingual language models.
Common Sense Reasoning Cross-Lingual Natural Language Inference +6
no code implementations • NeurIPS 2023 • Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetorp, Sebastian Riedel, Mikel Artetxe
Pretrained language models (PLMs) are today the primary model for natural language processing.
no code implementations • 23 May 2023 • Mikel Artetxe, Vedanuj Goswami, Shruti Bhosale, Angela Fan, Luke Zettlemoyer
Machine Translation (MT) has been widely used for cross-lingual classification, either by translating the test set into English and running inference with a monolingual model (translate-test), or translating the training set into the target languages and finetuning a multilingual model (translate-train).
no code implementations • 23 May 2023 • Aitor Ormazabal, Mikel Artetxe, Eneko Agirre
Methods for adapting language models (LMs) to new tasks and domains have traditionally assumed white-box access to the model, and work by modifying its parameters.
no code implementations • 20 Dec 2022 • Machel Reid, Mikel Artetxe
While prior work has established that the use of parallel data is conducive for cross-lingual learning, it is unclear if the improvements come from the data itself, or if it is the modeling of parallel interactions that matters.
no code implementations • 20 Dec 2022 • Kelly Marchisio, Patrick Lewis, Yihong Chen, Mikel Artetxe
Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen.
1 code implementation • 19 Dec 2022 • Mengzhou Xia, Mikel Artetxe, Chunting Zhou, Xi Victoria Lin, Ramakanth Pasunuru, Danqi Chen, Luke Zettlemoyer, Ves Stoyanov
Why do larger language models demonstrate more desirable behaviors?
no code implementations • 26 Oct 2022 • Mozes van de Kar, Mengzhou Xia, Danqi Chen, Mikel Artetxe
Our results suggest that the success of prompting can partly be explained by the model being exposed to similar examples during pretraining, which can be directly retrieved through regular expressions.
no code implementations • 6 Oct 2022 • Dieuwke Hupkes, Mario Giulianelli, Verna Dankers, Mikel Artetxe, Yanai Elazar, Tiago Pimentel, Christos Christodoulopoulos, Karim Lasri, Naomi Saphra, Arabella Sinclair, Dennis Ulmer, Florian Schottmann, Khuyagbaatar Batsuren, Kaiser Sun, Koustuv Sinha, Leila Khalatbari, Maria Ryskina, Rita Frieske, Ryan Cotterell, Zhijing Jin
We present a taxonomy for characterising and understanding generalisation research in NLP.
1 code implementation • 30 May 2022 • Mengzhou Xia, Mikel Artetxe, Jingfei Du, Danqi Chen, Ves Stoyanov
In this work, we adapt prompt-based few-shot learning to ELECTRA and show that it outperforms masked language models in a wide range of tasks.
no code implementations • 24 May 2022 • Mikel Artetxe, Jingfei Du, Naman Goyal, Luke Zettlemoyer, Ves Stoyanov
Prior work on language model pre-training has explored different architectures and learning objectives, but differences in data, hyperparameters and evaluation make a principled comparison difficult.
1 code implementation • 24 May 2022 • Aitor Ormazabal, Mikel Artetxe, Manex Agirrezabal, Aitor Soroa, Eneko Agirre
During inference, we build control codes for the desired meter and rhyme scheme, and condition our language model on them to generate formal verse poetry.
1 code implementation • ACL 2022 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre
Round-trip Machine Translation (MT) is a popular choice for paraphrase generation, which leverages readily available parallel corpora for supervision.
3 code implementations • 22 May 2022 • Christos Baziotis, Mikel Artetxe, James Cross, Shruti Bhosale
We find that hyper-adapters are more parameter efficient than regular adapters, reaching the same performance with up to 12 times less parameters.
no code implementations • NAACL 2022 • Jonas Pfeiffer, Naman Goyal, Xi Victoria Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe
Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages.
10 code implementations • 2 May 2022 • Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer
Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning.
Ranked #2 on Stereotypical Bias Analysis on CrowS-Pairs
no code implementations • 15 Mar 2022 • Mikel Artetxe, Itziar Aldabe, Rodrigo Agerri, Olatz Perez-de-Viñaspre, Aitor Soroa
For instance, 66% of documents are rated as high-quality for EusCrawl, in contrast with <33% for both mC4 and CC100.
no code implementations • 14 Mar 2022 • Ping Yu, Mikel Artetxe, Myle Ott, Sam Shleifer, Hongyu Gong, Ves Stoyanov, Xian Li
All-MLP architectures have attracted increasing interest as an alternative to attention-based models.
Ranked #17 on Question Answering on StoryCloze
2 code implementations • 25 Feb 2022 • Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer
Large language models (LMs) are able to in-context learn -- perform a new task via inference alone by conditioning on a few input-label pairs (demonstrations) and making predictions for new inputs.
2 code implementations • 20 Dec 2021 • Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li
Large-scale generative language models such as GPT-3 are competitive few-shot learners.
no code implementations • 20 Dec 2021 • Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov
This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning.
1 code implementation • NAACL 2022 • Machel Reid, Mikel Artetxe
Despite the success of multilingual sequence-to-sequence pretraining, most existing approaches rely on monolingual corpora, and do not make use of the strong cross-lingual signal contained in parallel data.
no code implementations • ACL 2021 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre
Recent research on cross-lingual word embeddings has been dominated by unsupervised mapping approaches that align monolingual embeddings.
Bilingual Lexicon Induction Cross-Lingual Word Embeddings +2
no code implementations • ACL 2020 • Ivana Kvapilikova, Mikel Artetxe, Gorka Labaka, Eneko Agirre, Ondřej Bojar
Existing models of multilingual sentence embeddings require large parallel data resources which are not available for low-resource languages.
1 code implementation • 23 Mar 2021 • Nicola De Cao, Ledell Wu, Kashyap Popat, Mikel Artetxe, Naman Goyal, Mikhail Plekhanov, Luke Zettlemoyer, Nicola Cancedda, Sebastian Riedel, Fabio Petroni
Moreover, in a zero-shot setting on languages with no training data at all, mGENRE treats the target language as a latent variable that is marginalized at prediction time.
Ranked #2 on Entity Disambiguation on Mewsli-9 (using extra training data)
no code implementations • 31 Dec 2020 • Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre
Recent research on cross-lingual word embeddings has been dominated by unsupervised mapping approaches that align monolingual embeddings.
Bilingual Lexicon Induction Cross-Lingual Word Embeddings +2
no code implementations • 29 May 2020 • Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa, Mikel Artetxe
We propose a modular architecture of language-specific encoder-decoders that constitutes a multilingual machine translation system that can be incrementally extended to new languages without the need for retraining the existing system when adding new languages.
no code implementations • ACL 2020 • Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre
We review motivations, definition, approaches, and methodology for unsupervised cross-lingual learning and call for a more rigorous position in each of them.
no code implementations • EACL 2021 • Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa, Mikel Artetxe
State-of-the-art multilingual machine translation relies on a universal encoder-decoder, which requires retraining the entire system to add new languages.
1 code implementation • EMNLP 2020 • Mikel Artetxe, Gorka Labaka, Eneko Agirre
Both human and machine translation play a central role in cross-lingual transfer learning: many multilingual datasets have been created through professional translation services, and using machine translation to translate either the test set or the training set is a widely used transfer technique.
no code implementations • 28 Feb 2020 • Mikel Artetxe, Gorka Labaka, Noe Casas, Eneko Agirre
In this paper, we analyze the role that such initialization plays in iterative back-translation.
7 code implementations • ACL 2020 • Mikel Artetxe, Sebastian Ruder, Dani Yogatama
This generalization ability has been attributed to the use of a shared subword vocabulary and joint training across multiple languages giving rise to deep multilingual abstractions.
no code implementations • CL 2019 • Pablo Gamallo, Susana Sotelo, Jos{\'e} Ramom Pichel, Mikel Artetxe
The contextualization of meaning is carried out by means of distributional composition within a structured vector space with syntactic dependencies, and the bilingual space is created by means of transfer rules and a bilingual dictionary.
1 code implementation • ACL 2019 • Mikel Artetxe, Gorka Labaka, Eneko Agirre
A recent research line has obtained strong results on bilingual lexicon induction by aligning independently trained word embeddings in two languages and using the resulting cross-lingual embeddings to induce word translation pairs through nearest neighbor or related retrieval methods.
no code implementations • ACL 2019 • Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa, Eneko Agirre
Recent research in cross-lingual word embeddings has almost exclusively focused on offline methods, which independently train word embeddings in different languages and map them to a shared space through linear transformations.
Bilingual Lexicon Induction Cross-Lingual Word Embeddings +1
1 code implementation • ACL 2019 • Mikel Artetxe, Gorka Labaka, Eneko Agirre
While machine translation has traditionally relied on large amounts of parallel corpora, a recent research line has managed to train both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) systems using monolingual corpora only.
13 code implementations • TACL 2019 • Mikel Artetxe, Holger Schwenk
We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts.
Cross-Lingual Bitext Mining Cross-Lingual Document Classification +7
9 code implementations • ACL 2019 • Mikel Artetxe, Holger Schwenk
Machine translation is highly sensitive to the size and quality of the training data, which has led to an increasing interest in collecting and filtering large parallel corpora.
2 code implementations • CONLL 2018 • Mikel Artetxe, Gorka Labaka, Iñigo Lopez-Gazpio, Eneko Agirre
Following the recent success of word embeddings, it has been argued that there is no such thing as an ideal representation for words, as different models tend to capture divergent and often mutually incompatible aspects like semantics/syntax and similarity/relatedness.
3 code implementations • EMNLP 2018 • Mikel Artetxe, Gorka Labaka, Eneko Agirre
While modern machine translation has relied on large parallel corpora, a recent line of work has managed to train Neural Machine Translation (NMT) systems from monolingual corpora only (Artetxe et al., 2018c; Lample et al., 2018).
Ranked #3 on Machine Translation on WMT2014 French-English
2 code implementations • ACL 2018 • Mikel Artetxe, Gorka Labaka, Eneko Agirre
Recent work has managed to learn cross-lingual word embeddings without parallel data by mapping monolingual embeddings to a shared space through adversarial training.
2 code implementations • ICLR 2018 • Mikel Artetxe, Gorka Labaka, Eneko Agirre, Kyunghyun Cho
In spite of the recent success of neural machine translation (NMT) in standard benchmarks, the lack of large parallel corpora poses a major practical problem for many language pairs.
Ranked #6 on Machine Translation on WMT2015 English-German
no code implementations • ACL 2017 • Mikel Artetxe, Gorka Labaka, Eneko Agirre
Most methods to learn bilingual word embeddings rely on large parallel corpora, which is difficult to obtain for most language pairs.