Search Results for author: Edouard Grave

Found 50 papers, 32 papers with code

Are Large-scale Datasets Necessary for Self-Supervised Pre-training?

no code implementations20 Dec 2021 Alaaeldin El-Nouby, Gautier Izacard, Hugo Touvron, Ivan Laptev, Hervé Jegou, Edouard Grave

Our study shows that denoising autoencoders, such as BEiT or a variant that we introduce in this paper, are more robust to the type and size of the pre-training data than popular self-supervised methods trained by comparing image embeddings. We obtain competitive performance compared to ImageNet pre-training on a variety of classification datasets, from different domains.

Computer Vision Denoising +2

The Web Is Your Oyster -- Knowledge-Intensive NLP against a Very Large Web Corpus

2 code implementations18 Dec 2021 Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Dmytro Okhonko, Samuel Broscheit, Gautier Izacard, Patrick Lewis, Barlas Oğuz, Edouard Grave, Wen-tau Yih, Sebastian Riedel

In order to address increasing demands of real-world applications, the research for knowledge-intensive NLP (KI-NLP) should advance by capturing the challenges of a truly open-domain environment: web-scale knowledge, lack of structure, inconsistent quality and noise.

Common Sense Reasoning

Unsupervised Dense Information Retrieval with Contrastive Learning

3 code implementations16 Dec 2021 Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave

In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings.

Contrastive Learning Cross-Lingual Transfer +3

Contrastive Pre-training for Zero-Shot Information Retrieval

no code implementations29 Sep 2021 Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave

By contrast, in many other NLP tasks, conventional self-supervised pre-training based on masking leads to strong generalization with small number of training examples.

Contrastive Learning Fact Checking +3

A Memory Efficient Baseline for Open Domain Question Answering

1 code implementation30 Dec 2020 Gautier Izacard, Fabio Petroni, Lucas Hosseini, Nicola De Cao, Sebastian Riedel, Edouard Grave

Recently, retrieval systems based on dense representations have led to important improvements in open-domain question answering, and related tasks.

Dimensionality Reduction Open-Domain Question Answering +1

Distilling Knowledge from Reader to Retriever for Question Answering

2 code implementations ICLR 2021 Gautier Izacard, Edouard Grave

A challenge of using such methods is to obtain supervised data to train the retriever model, corresponding to pairs of query and support documents.

Information Retrieval Knowledge Distillation +2

Beyond English-Centric Multilingual Machine Translation

4 code implementations21 Oct 2020 Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin

Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages.

Machine Translation Translation

Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

4 code implementations EACL 2021 Gautier Izacard, Edouard Grave

Generative models for open domain question answering have proven to be competitive, without resorting to external knowledge.

 Ranked #1 on Question Answering on SQuAD (F1 metric)

Open-Domain Question Answering Passage Retrieval

Training with Quantization Noise for Extreme Model Compression

3 code implementations ICLR 2021 Angela Fan, Pierre Stock, Benjamin Graham, Edouard Grave, Remi Gribonval, Herve Jegou, Armand Joulin

A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator.

Image Generation Model Compression +1

End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures

1 code implementation19 Nov 2019 Gabriel Synnaeve, Qiantong Xu, Jacob Kahn, Tatiana Likhomanenko, Edouard Grave, Vineel Pratap, Anuroop Sriram, Vitaliy Liptchinsky, Ronan Collobert

We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions.

Ranked #14 on Speech Recognition on LibriSpeech test-other (using extra training data)

speech-recognition Speech Recognition

CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB

3 code implementations ACL 2021 Holger Schwenk, Guillaume Wenzek, Sergey Edunov, Edouard Grave, Armand Joulin

To evaluate the quality of the mined bitexts, we train NMT systems for most of the language pairs and evaluate them on TED, WMT and WAT test sets.

Translation

Unsupervised Cross-lingual Representation Learning at Scale

24 code implementations ACL 2020 Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov

We also present a detailed empirical analysis of the key factors that are required to achieve these gains, including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource languages at scale.

Cross-Lingual Transfer Language Modelling +2

Depth-Adaptive Transformer

no code implementations ICLR 2020 Maha Elbayad, Jiatao Gu, Edouard Grave, Michael Auli

State of the art sequence-to-sequence models for large scale tasks perform a fixed number of computations for each input sequence regardless of whether it is easy or hard to process.

Machine Translation Translation

Reducing Transformer Depth on Demand with Structured Dropout

4 code implementations ICLR 2020 Angela Fan, Edouard Grave, Armand Joulin

Overparameterized transformer networks have obtained state of the art results in various natural language processing tasks, such as machine translation, language modeling, and question answering.

Language Modelling Machine Translation +3

Augmenting Self-attention with Persistent Memory

6 code implementations2 Jul 2019 Sainbayar Sukhbaatar, Edouard Grave, Guillaume Lample, Herve Jegou, Armand Joulin

More precisely, we augment the self-attention layers with persistent memory vectors that play a similar role as the feed-forward layer.

Language Modelling Translation

Training Hybrid Language Models by Marginalizing over Segmentations

no code implementations ACL 2019 Edouard Grave, Sainbayar Sukhbaatar, Piotr Bojanowski, Arm Joulin,

In this paper, we study the problem of hybrid language modeling, that is using models which can predict both characters and larger units such as character ngrams or words.

Language Modelling

Looking for ELMo's friends: Sentence-Level Pretraining Beyond Language Modeling

no code implementations ICLR 2019 Samuel R. Bowman, Ellie Pavlick, Edouard Grave, Benjamin Van Durme, Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen

Work on the problem of contextualized word representation—the development of reusable neural network components for sentence understanding—has recently seen a surge of progress centered on the unsupervised pretraining task of language modeling with methods like ELMo (Peters et al., 2018).

Language Modelling

Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion

4 code implementations EMNLP 2018 Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Herve Jegou, Edouard Grave

Continuous word representations learned separately on distinct languages can be aligned so that their words become comparable in a common space.

Translation Word Translation

Lightweight Adaptive Mixture of Neural and N-gram Language Models

no code implementations20 Apr 2018 Anton Bakhtin, Arthur Szlam, Marc'Aurelio Ranzato, Edouard Grave

It is often the case that the best performing language model is an ensemble of a neural language model with n-grams.

Language Modelling

Colorless green recurrent networks dream hierarchically

2 code implementations NAACL 2018 Kristina Gulordava, Piotr Bojanowski, Edouard Grave, Tal Linzen, Marco Baroni

Recurrent neural networks (RNNs) have achieved impressive results in a variety of linguistic processing tasks, suggesting that they can induce non-trivial properties of language.

Language Modelling

Learning Word Vectors for 157 Languages

2 code implementations LREC 2018 Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, Tomas Mikolov

Distributed word representations, or word vectors, have recently been applied to many tasks in natural language processing, leading to state-of-the-art performance.

Natural Language Processing

Advances in Pre-Training Distributed Word Representations

5 code implementations LREC 2018 Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, Armand Joulin

Many Natural Language Processing applications nowadays rely on pre-trained word representations estimated from large text corpora such as news collections, Wikipedia and Web Crawl.

Natural Language Processing

Unbounded cache model for online language modeling with open vocabulary

2 code implementations NeurIPS 2017 Edouard Grave, Moustapha Cisse, Armand Joulin

Recently, continuous cache models were proposed as extensions to recurrent neural network language models, to adapt their predictions to local changes in the data distribution.

Language Modelling Quantization

Fast Linear Model for Knowledge Graph Embeddings

1 code implementation30 Oct 2017 Armand Joulin, Edouard Grave, Piotr Bojanowski, Maximilian Nickel, Tomas Mikolov

This paper shows that a simple baseline based on a Bag-of-Words (BoW) representation learns surprisingly good knowledge graph embeddings.

General Classification Knowledge Base Completion +2

Parseval Networks: Improving Robustness to Adversarial Examples

1 code implementation ICML 2017 Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, Nicolas Usunier

We introduce Parseval networks, a form of deep neural networks in which the Lipschitz constant of linear, convolutional and aggregation layers is constrained to be smaller than 1.

Improving Neural Language Models with a Continuous Cache

13 code implementations13 Dec 2016 Edouard Grave, Armand Joulin, Nicolas Usunier

We propose an extension to neural network language models to adapt their prediction to the recent history.

Language Modelling

FastText.zip: Compressing text classification models

41 code implementations12 Dec 2016 Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, Tomas Mikolov

We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory.

Classification General Classification +3

Variable Computation in Recurrent Neural Networks

no code implementations18 Nov 2016 Yacine Jernite, Edouard Grave, Armand Joulin, Tomas Mikolov

Recurrent neural networks (RNNs) have been used extensively and with increasing success to model various types of sequential data.

Efficient softmax approximation for GPUs

12 code implementations ICML 2017 Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, Hervé Jégou

We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies.

Enriching Word Vectors with Subword Information

48 code implementations TACL 2017 Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov

A vector representation is associated to each character $n$-gram; words being represented as the sum of these representations.

Natural Language Processing Word Embeddings +1

Longitudinal Analysis of Discussion Topics in an Online Breast Cancer Community using Convolutional Neural Networks

no code implementations28 Mar 2016 Shaodian Zhang, Edouard Grave, Elizabeth Sklar, Noemie Elhadad

Identifying topics of discussions in online health communities (OHC) is critical to various applications, but can be difficult because topics of OHC content are usually heterogeneous and domain-dependent.

General Classification Topic Classification

Weakly-Supervised Alignment of Video With Text

no code implementations ICCV 2015 Piotr Bojanowski, Rémi Lajugie, Edouard Grave, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid

Given vectorial features for both video and text, we propose to cast this task as a temporal assignment problem, with an implicit linear mapping between the two feature modalities.

Trace Lasso: a trace norm regularization for correlated designs

no code implementations NeurIPS 2011 Edouard Grave, Guillaume R. Obozinski, Francis R. Bach

This norm, called the trace Lasso, uses the trace norm of the selected covariates, which is a convex surrogate of their rank, as the criterion of model complexity.

Cannot find the paper you are looking for? You can Submit a new open access paper.