no code implementations • ICLR 2019 • Jean Alaux, Edouard Grave, Marco Cuturi, Armand Joulin
This paper extends this line of work to the problem of aligning multiple languages to a common space.
no code implementations • 3 Oct 2024 • Franck Signe Talla, Herve Jegou, Edouard Grave
We address the problem of extending a pretrained large language model to a new domain that was not seen at training time, like adding a language for which the original model has seen no or little training data.
1 code implementation • 17 Sep 2024 • Alexandre Défossez, Laurent Mazaré, Manu Orsini, Amélie Royer, Patrick Pérez, Hervé Jégou, Edouard Grave, Neil Zeghidour
Our resulting model is the first real-time full-duplex spoken large language model, with a theoretical latency of 160ms, 200ms in practice, and is available at https://github. com/kyutai-labs/moshi.
1 code implementation • 6 Jun 2024 • Xiou Ge, Ali Mousavi, Edouard Grave, Armand Joulin, Kun Qian, Benjamin Han, Mostafa Arefiyan, Yunyao Li
It is thus essential to design effective methods to both update obsolete knowledge and induce new knowledge into LLMs.
no code implementations • 22 Nov 2023 • Giovanni Monea, Armand Joulin, Edouard Grave
As an alternative, we propose to use parallel decoding as a way to draft multiple tokens from a single model with no computational cost, nor the need for a second model.
52 code implementations • arXiv 2023 • Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters.
Ranked #3 on Few-Shot Learning on MedConceptsQA
1 code implementation • 15 Feb 2023 • Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christoforos Nalmpantis, Ram Pasunuru, Roberta Raileanu, Baptiste Rozière, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz, Edouard Grave, Yann Lecun, Thomas Scialom
This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools.
1 code implementation • 27 Sep 2022 • Jane Dwivedi-Yu, Timo Schick, Zhengbao Jiang, Maria Lomeli, Patrick Lewis, Gautier Izacard, Edouard Grave, Sebastian Riedel, Fabio Petroni
Evaluation of text generation to date has primarily focused on content created sequentially, rather than improvements on a piece of text.
no code implementations • 24 Aug 2022 • Timo Schick, Jane Dwivedi-Yu, Zhengbao Jiang, Fabio Petroni, Patrick Lewis, Gautier Izacard, Qingfei You, Christoforos Nalmpantis, Edouard Grave, Sebastian Riedel
Textual content is often the output of a collaborative writing process: We start with an initial draft, ask for suggestions, and repeatedly make changes.
1 code implementation • 5 Aug 2022 • Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, Edouard Grave
Retrieval augmented models are known to excel at knowledge intensive tasks without the need for as many parameters, but it is unclear whether they work in few-shot settings.
Ranked #1 on Question Answering on Natural Questions
1 code implementation • 8 Jul 2022 • Fabio Petroni, Samuel Broscheit, Aleksandra Piktus, Patrick Lewis, Gautier Izacard, Lucas Hosseini, Jane Dwivedi-Yu, Maria Lomeli, Timo Schick, Pierre-Emmanuel Mazaré, Armand Joulin, Edouard Grave, Sebastian Riedel
Hence, maintaining and improving the quality of Wikipedia references is an important challenge and there is a pressing need for better tools to assist humans in this effort.
2 code implementations • 29 Jan 2022 • Jacob Kahn, Vineel Pratap, Tatiana Likhomanenko, Qiantong Xu, Awni Hannun, Jeff Cai, Paden Tomasello, Ann Lee, Edouard Grave, Gilad Avidov, Benoit Steiner, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
This is in part due to the difficulties involved in prototyping new computational paradigms with existing frameworks.
no code implementations • 20 Dec 2021 • Alaaeldin El-Nouby, Gautier Izacard, Hugo Touvron, Ivan Laptev, Hervé Jegou, Edouard Grave
Our study shows that denoising autoencoders, such as BEiT or a variant that we introduce in this paper, are more robust to the type and size of the pre-training data than popular self-supervised methods trained by comparing image embeddings. We obtain competitive performance compared to ImageNet pre-training on a variety of classification datasets, from different domains.
2 code implementations • 18 Dec 2021 • Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Dmytro Okhonko, Samuel Broscheit, Gautier Izacard, Patrick Lewis, Barlas Oğuz, Edouard Grave, Wen-tau Yih, Sebastian Riedel
In order to address increasing demands of real-world applications, the research for knowledge-intensive NLP (KI-NLP) should advance by capturing the challenges of a truly open-domain environment: web-scale knowledge, lack of structure, inconsistent quality and noise.
6 code implementations • 16 Dec 2021 • Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave
In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings.
no code implementations • 29 Sep 2021 • Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave
By contrast, in many other NLP tasks, conventional self-supervised pre-training based on masking leads to strong generalization with small number of training examples.
16 code implementations • NeurIPS 2021 • Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Gautier Izacard, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, Hervé Jégou
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification.
Ranked #1 on Image Classification on Certificate Verification
no code implementations • 1 Jan 2021 • Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini, Nicola De Cao, Edouard Grave, Ikuya Yamada, Sonse Shimaoka, Masatoshi Suzuki, Shumpei Miyawaki, Shun Sato, Ryo Takahashi, Jun Suzuki, Martin Fajcik, Martin Docekal, Karel Ondrej, Pavel Smrz, Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Barlas Oguz, Xilun Chen, Vladimir Karpukhin, Stan Peshterliev, Dmytro Okhonko, Michael Schlichtkrull, Sonal Gupta, Yashar Mehdad, Wen-tau Yih
We review the EfficientQA competition from NeurIPS 2020.
1 code implementation • 30 Dec 2020 • Gautier Izacard, Fabio Petroni, Lucas Hosseini, Nicola De Cao, Sebastian Riedel, Edouard Grave
Recently, retrieval systems based on dense representations have led to important improvements in open-domain question answering, and related tasks.
4 code implementations • ICLR 2021 • Gautier Izacard, Edouard Grave
A challenge of using such methods is to obtain supervised data to train the retriever model, corresponding to pairs of query and support documents.
Ranked #9 on Question Answering on NarrativeQA
8 code implementations • 21 Oct 2020 • Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin
Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages.
1 code implementation • NAACL 2021 • Jingfei Du, Edouard Grave, Beliz Gunel, Vishrav Chaudhary, Onur Celebi, Michael Auli, Ves Stoyanov, Alexis Conneau
Unsupervised pre-training has led to much recent progress in natural language understanding.
8 code implementations • EACL 2021 • Gautier Izacard, Edouard Grave
Generative models for open domain question answering have proven to be competitive, without resorting to external knowledge.
Ranked #1 on Question Answering on ConditionalQA
4 code implementations • ICLR 2021 • Angela Fan, Pierre Stock, Benjamin Graham, Edouard Grave, Remi Gribonval, Herve Jegou, Armand Joulin
A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator.
4 code implementations • 21 Feb 2020 • Angela Fan, Thibaut Lavril, Edouard Grave, Armand Joulin, Sainbayar Sukhbaatar
Transformers have been successfully applied to sequential, auto-regressive tasks despite being feedforward networks.
1 code implementation • 19 Nov 2019 • Gabriel Synnaeve, Qiantong Xu, Jacob Kahn, Tatiana Likhomanenko, Edouard Grave, Vineel Pratap, Anuroop Sriram, Vitaliy Liptchinsky, Ronan Collobert
We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions.
Ranked #19 on Speech Recognition on LibriSpeech test-other (using extra training data)
3 code implementations • ACL 2021 • Holger Schwenk, Guillaume Wenzek, Sergey Edunov, Edouard Grave, Armand Joulin
To evaluate the quality of the mined bitexts, we train NMT systems for most of the language pairs and evaluate them on TED, WMT and WAT test sets.
30 code implementations • ACL 2020 • Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov
We also present a detailed empirical analysis of the key factors that are required to achieve these gains, including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource languages at scale.
2 code implementations • LREC 2020 • Guillaume Wenzek, Marie-Anne Lachaux, Alexis Conneau, Vishrav Chaudhary, Francisco Guzmán, Armand Joulin, Edouard Grave
Pre-training text representations have led to significant improvements in many areas of natural language processing.
no code implementations • ICLR 2020 • Maha Elbayad, Jiatao Gu, Edouard Grave, Michael Auli
State of the art sequence-to-sequence models for large scale tasks perform a fixed number of computations for each input sequence regardless of whether it is easy or hard to process.
no code implementations • 14 Oct 2019 • Piotr Bojanowski, Onur Celebi, Tomas Mikolov, Edouard Grave, Armand Joulin
In this paper, we focus on the problem of adapting word vector-based models to new textual data.
no code implementations • 25 Sep 2019 • Gautier Izacard, Armand Joulin, Edouard Grave
It is closely related to the problem of online learning of language models.
5 code implementations • ICLR 2020 • Angela Fan, Edouard Grave, Armand Joulin
Overparameterized transformer networks have obtained state of the art results in various natural language processing tasks, such as machine translation, language modeling, and question answering.
Ranked #5 on Open-Domain Question Answering on ELI5
no code implementations • IJCNLP 2019 • Paula Czarnowska, Sebastian Ruder, Edouard Grave, Ryan Cotterell, Ann Copestake
Human translators routinely have to translate rare inflections of words - due to the Zipfian distribution of words in a language.
2 code implementations • 2 Jul 2019 • Sainbayar Sukhbaatar, Edouard Grave, Guillaume Lample, Herve Jegou, Armand Joulin
More precisely, we augment the self-attention layers with persistent memory vectors that play a similar role as the feed-forward layer.
Ranked #5 on Language Modelling on Text8
no code implementations • ACL 2019 • Edouard Grave, Sainbayar Sukhbaatar, Piotr Bojanowski, Arm Joulin,
In this paper, we study the problem of hybrid language modeling, that is using models which can predict both characters and larger units such as character ngrams or words.
2 code implementations • NAACL 2019 • Bora Edizel, Aleksandra Piktus, Piotr Bojanowski, Rui Ferreira, Edouard Grave, Fabrizio Silvestri
In this paper we present a method to learn word embeddings that are resilient to misspellings.
7 code implementations • ACL 2019 • Sainbayar Sukhbaatar, Edouard Grave, Piotr Bojanowski, Armand Joulin
We propose a novel self-attention mechanism that can learn its optimal attention span.
Ranked #4 on Language Modelling on Text8
no code implementations • ICLR 2019 • Samuel R. Bowman, Ellie Pavlick, Edouard Grave, Benjamin Van Durme, Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen
Work on the problem of contextualized word representation—the development of reusable neural network components for sentence understanding—has recently seen a surge of progress centered on the unsupervised pretraining task of language modeling with methods like ELMo (Peters et al., 2018).
no code implementations • ACL 2019 • Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen, Benjamin Van Durme, Edouard Grave, Ellie Pavlick, Samuel R. Bowman
Natural language understanding has recently seen a surge of progress with the use of sentence encoders like ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2019) which are pretrained on variants of language modeling.
no code implementations • 2 Nov 2018 • Jean Alaux, Edouard Grave, Marco Cuturi, Armand Joulin
This paper extends this line of work to the problem of aligning multiple languages to a common space.
3 code implementations • 29 May 2018 • Edouard Grave, Armand Joulin, Quentin Berthet
A library for Multilingual Unsupervised or Supervised word Embeddings
no code implementations • 20 Apr 2018 • Anton Bakhtin, Arthur Szlam, Marc'Aurelio Ranzato, Edouard Grave
It is often the case that the best performing language model is an ensemble of a neural language model with n-grams.
4 code implementations • EMNLP 2018 • Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Herve Jegou, Edouard Grave
Continuous word representations learned separately on distinct languages can be aligned so that their words become comparable in a common space.
2 code implementations • NAACL 2018 • Kristina Gulordava, Piotr Bojanowski, Edouard Grave, Tal Linzen, Marco Baroni
Recurrent neural networks (RNNs) have achieved impressive results in a variety of linguistic processing tasks, suggesting that they can induce non-trivial properties of language.
2 code implementations • LREC 2018 • Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, Tomas Mikolov
Distributed word representations, or word vectors, have recently been applied to many tasks in natural language processing, leading to state-of-the-art performance.
Ranked #12 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (using extra training data)
5 code implementations • LREC 2018 • Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, Armand Joulin
Many Natural Language Processing applications nowadays rely on pre-trained word representations estimated from large text corpora such as news collections, Wikipedia and Web Crawl.
2 code implementations • NeurIPS 2017 • Edouard Grave, Moustapha Cisse, Armand Joulin
Recently, continuous cache models were proposed as extensions to recurrent neural network language models, to adapt their predictions to local changes in the data distribution.
1 code implementation • 30 Oct 2017 • Armand Joulin, Edouard Grave, Piotr Bojanowski, Maximilian Nickel, Tomas Mikolov
This paper shows that a simple baseline based on a Bag-of-Words (BoW) representation learns surprisingly good knowledge graph embeddings.
1 code implementation • ICML 2017 • Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, Nicolas Usunier
We introduce Parseval networks, a form of deep neural networks in which the Lipschitz constant of linear, convolutional and aggregation layers is constrained to be smaller than 1.
14 code implementations • 13 Dec 2016 • Edouard Grave, Armand Joulin, Nicolas Usunier
We propose an extension to neural network language models to adapt their prediction to the recent history.
Ranked #33 on Language Modelling on WikiText-2
43 code implementations • 12 Dec 2016 • Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, Tomas Mikolov
We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory.
no code implementations • 18 Nov 2016 • Yacine Jernite, Edouard Grave, Armand Joulin, Tomas Mikolov
Recurrent neural networks (RNNs) have been used extensively and with increasing success to model various types of sequential data.
12 code implementations • ICML 2017 • Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, Hervé Jégou
We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies.
53 code implementations • TACL 2017 • Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov
A vector representation is associated to each character $n$-gram; words being represented as the sum of these representations.
Ranked #3 on Word Similarity on WS353
64 code implementations • EACL 2017 • Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov
This paper explores a simple and efficient baseline for text classification.
Ranked #1 on Sentiment Analysis on Sogou News
Emotion Recognition in Conversation General Classification +2
no code implementations • 28 Mar 2016 • Shaodian Zhang, Edouard Grave, Elizabeth Sklar, Noemie Elhadad
Identifying topics of discussions in online health communities (OHC) is critical to various applications, but can be difficult because topics of OHC content are usually heterogeneous and domain-dependent.
no code implementations • ICCV 2015 • Piotr Bojanowski, Rémi Lajugie, Edouard Grave, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid
Given vectorial features for both video and text, we propose to cast this task as a temporal assignment problem, with an implicit linear mapping between the two feature modalities.
no code implementations • 14 Dec 2013 • Edouard Grave, Guillaume Obozinski, Francis Bach
Most natural language processing systems based on machine learning are not robust to domain shift.
no code implementations • NeurIPS 2011 • Edouard Grave, Guillaume R. Obozinski, Francis R. Bach
This norm, called the trace Lasso, uses the trace norm of the selected covariates, which is a convex surrogate of their rank, as the criterion of model complexity.