Search Results for author: Armand Joulin

Found 80 papers, 59 papers with code

Efficient Optimization for Discriminative Latent Class Models

no code implementations NeurIPS 2010 Armand Joulin, Jean Ponce, Francis R. Bach

To avoid this problem, we introduce a local approximation of this cost function, which leads to a quadratic non-convex optimization problem over a product of simplices.

Clustering Document Classification +2

Unsupervised Joint Object Discovery and Segmentation in Internet Images

no code implementations CVPR 2013 Michael Rubinstein, Armand Joulin, Johannes Kopf, Ce Liu

In contrast to previous co-segmentation methods, our algorithm performs well even in the presence of significant amounts of noise images (images not containing a common object), as typical for datasets collected from Internet search.

Object Object Discovery +1

Recovering Stereo Pairs from Anaglyphs

no code implementations CVPR 2013 Armand Joulin, Sing Bing Kang

An anaglyph is a single image created by selecting complementary colors from a stereo color pair; the user can perceive depth by viewing it through color-filtered glasses.

Learning Longer Memory in Recurrent Neural Networks

5 code implementations24 Dec 2014 Tomas Mikolov, Armand Joulin, Sumit Chopra, Michael Mathieu, Marc'Aurelio Ranzato

In this paper, we show that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent.

Language Modelling

Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks

20 code implementations19 Feb 2015 Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M. Rush, Bart van Merriënboer, Armand Joulin, Tomas Mikolov

One long-term goal of machine learning research is to produce methods that are applicable to reasoning and natural language, in particular building an intelligent dialogue agent.

Question Answering Reading Comprehension

Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets

4 code implementations NeurIPS 2015 Armand Joulin, Tomas Mikolov

Despite the recent achievements in machine learning, we are still very far from achieving real artificial intelligence.

Learning Visual Features from Large Weakly Supervised Data

no code implementations6 Nov 2015 Armand Joulin, Laurens van der Maaten, Allan Jabri, Nicolas Vasilache

We train convolutional networks on a dataset of 100 million Flickr photos and captions, and show that these networks produce features that perform well in a range of vision problems.

Representation Learning Word Similarity

Alternative structures for character-level RNNs

1 code implementation19 Nov 2015 Piotr Bojanowski, Armand Joulin, Tomas Mikolov

The first one consists on conditioning the character level representation on the previous word representation.

Language Modelling

Learning Simple Algorithms from Examples

1 code implementation23 Nov 2015 Wojciech Zaremba, Tomas Mikolov, Armand Joulin, Rob Fergus

We present an approach for learning simple algorithms such as copying, multi-digit addition and single digit multiplication directly from examples.

Q-Learning

A Roadmap towards Machine Intelligence

1 code implementation25 Nov 2015 Tomas Mikolov, Armand Joulin, Marco Baroni

The development of intelligent machines is one of the biggest unsolved challenges in computer science.

Locally-Optimized Inter-Subject Alignment of Functional Cortical Regions

no code implementations7 Jun 2016 Marius Cătălin Iordan, Armand Joulin, Diane M. Beck, Li Fei-Fei

Our method outperforms the two most commonly used alternatives (anatomical landmark-based AFNI alignment and cortical convexity-based FreeSurfer alignment) in overlap between predicted region and functionally-defined LOC.

Revisiting Visual Question Answering Baselines

3 code implementations27 Jun 2016 Allan Jabri, Armand Joulin, Laurens van der Maaten

Visual question answering (VQA) is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image understanding.

Binary Classification Multiple-choice +2

Enriching Word Vectors with Subword Information

52 code implementations TACL 2017 Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov

A vector representation is associated to each character $n$-gram; words being represented as the sum of these representations.

Word Embeddings Word Similarity

Efficient softmax approximation for GPUs

12 code implementations ICML 2017 Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, Hervé Jégou

We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies.

Variable Computation in Recurrent Neural Networks

no code implementations18 Nov 2016 Yacine Jernite, Edouard Grave, Armand Joulin, Tomas Mikolov

Recurrent neural networks (RNNs) have been used extensively and with increasing success to model various types of sequential data.

FastText.zip: Compressing text classification models

42 code implementations12 Dec 2016 Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, Tomas Mikolov

We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory.

General Classification Quantization +2

Improving Neural Language Models with a Continuous Cache

14 code implementations13 Dec 2016 Edouard Grave, Armand Joulin, Nicolas Usunier

We propose an extension to neural network language models to adapt their prediction to the recent history.

Language Modelling

CommAI: Evaluating the first steps towards a useful general AI

no code implementations31 Jan 2017 Marco Baroni, Armand Joulin, Allan Jabri, Germàn Kruszewski, Angeliki Lazaridou, Klemen Simonic, Tomas Mikolov

With machine learning successfully applied to new daunting problems almost every day, general AI starts looking like an attainable goal.

BIG-bench Machine Learning Continual Learning +2

Unsupervised Learning by Predicting Noise

1 code implementation ICML 2017 Piotr Bojanowski, Armand Joulin

We propose to fix a set of target representations, called Noise As Targets (NAT), and to constrain the deep features to align to them.

Optimizing the Latent Space of Generative Networks

6 code implementations ICML 2018 Piotr Bojanowski, Armand Joulin, David Lopez-Paz, Arthur Szlam

Generative Adversarial Networks (GANs) have achieved remarkable results in the task of generating realistic natural images.

Fast Linear Model for Knowledge Graph Embeddings

1 code implementation30 Oct 2017 Armand Joulin, Edouard Grave, Piotr Bojanowski, Maximilian Nickel, Tomas Mikolov

This paper shows that a simple baseline based on a Bag-of-Words (BoW) representation learns surprisingly good knowledge graph embeddings.

General Classification Knowledge Base Completion +2

Unbounded cache model for online language modeling with open vocabulary

2 code implementations NeurIPS 2017 Edouard Grave, Moustapha Cisse, Armand Joulin

Recently, continuous cache models were proposed as extensions to recurrent neural network language models, to adapt their predictions to local changes in the data distribution.

Language Modelling Quantization

Advances in Pre-Training Distributed Word Representations

5 code implementations LREC 2018 Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, Armand Joulin

Many Natural Language Processing applications nowadays rely on pre-trained word representations estimated from large text corpora such as news collections, Wikipedia and Web Crawl.

Learning Word Vectors for 157 Languages

2 code implementations LREC 2018 Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, Tomas Mikolov

Distributed word representations, or word vectors, have recently been applied to many tasks in natural language processing, leading to state-of-the-art performance.

Ranked #12 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (using extra training data)

Only Connect Walls Dataset Task 1 (Grouping)

Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion

4 code implementations EMNLP 2018 Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Herve Jegou, Edouard Grave

Continuous word representations learned separately on distinct languages can be aligned so that their words become comparable in a common space.

regression Retrieval +2

Deep Clustering for Unsupervised Learning of Visual Features

9 code implementations ECCV 2018 Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze

In this work, we present DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features.

 Ranked #1 on Image Clustering on CIFAR-100 (Train Set metric, using extra training data)

Clustering Deep Clustering +2

Cooperative Learning of Disjoint Syntax and Semantics

1 code implementation NAACL 2019 Serhii Havrylov, Germán Kruszewski, Armand Joulin

There has been considerable attention devoted to models that learn to jointly infer an expression's syntactic structure and its semantics.

Domain Generalization Natural Language Inference +1

Unsupervised Pre-Training of Image Features on Non-Curated Data

2 code implementations ICCV 2019 Mathilde Caron, Piotr Bojanowski, Julien Mairal, Armand Joulin

Our goal is to bridge the performance gap between unsupervised methods trained on curated data, which are costly to obtain, and massive raw datasets that are easily available.

Clustering Self-Supervised Image Classification +1

Augmenting Self-attention with Persistent Memory

2 code implementations2 Jul 2019 Sainbayar Sukhbaatar, Edouard Grave, Guillaume Lample, Herve Jegou, Armand Joulin

More precisely, we augment the self-attention layers with persistent memory vectors that play a similar role as the feed-forward layer.

Language Modelling Translation

Why Build an Assistant in Minecraft?

1 code implementation22 Jul 2019 Arthur Szlam, Jonathan Gray, Kavya Srinet, Yacine Jernite, Armand Joulin, Gabriel Synnaeve, Douwe Kiela, Haonan Yu, Zhuoyuan Chen, Siddharth Goyal, Demi Guo, Danielle Rothermel, C. Lawrence Zitnick, Jason Weston

In this document we describe a rationale for a research program aimed at building an open "assistant" in the game Minecraft, in order to make progress on the problems of natural language understanding and learning from dialogue.

Natural Language Understanding

Reducing Transformer Depth on Demand with Structured Dropout

5 code implementations ICLR 2020 Angela Fan, Edouard Grave, Armand Joulin

Overparameterized transformer networks have obtained state of the art results in various natural language processing tasks, such as machine translation, language modeling, and question answering.

Language Modelling Machine Translation +2

Finding Winning Tickets with Limited (or No) Supervision

no code implementations25 Sep 2019 Mathilde Caron, Ari Morcos, Piotr Bojanowski, Julien Mairal, Armand Joulin

The lottery ticket hypothesis argues that neural networks contain sparse subnetworks, which, if appropriately initialized (the winning tickets), are capable of matching the accuracy of the full network when trained in isolation.

CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB

3 code implementations ACL 2021 Holger Schwenk, Guillaume Wenzek, Sergey Edunov, Edouard Grave, Armand Joulin

To evaluate the quality of the mined bitexts, we train NMT systems for most of the language pairs and evaluate them on TED, WMT and WAT test sets.

NMT Sentence +2

Libri-Light: A Benchmark for ASR with Limited or No Supervision

2 code implementations17 Dec 2019 Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdel-rahman Mohamed, Emmanuel Dupoux

Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER).

 Ranked #1 on Speech Recognition on Libri-Light test-other (ABX-within metric)

speech-recognition Speech Recognition

Pruning Convolutional Neural Networks with Self-Supervision

no code implementations10 Jan 2020 Mathilde Caron, Ari Morcos, Piotr Bojanowski, Julien Mairal, Armand Joulin

In this work, we investigate the use of standard pruning methods, developed primarily for supervised learning, for networks trained without labels (i. e. on self-supervised tasks).

Unsupervised pretraining transfers well across languages

3 code implementations7 Feb 2020 Morgane Rivière, Armand Joulin, Pierre-Emmanuel Mazaré, Emmanuel Dupoux

Cross-lingual and multi-lingual training of Automatic Speech Recognition (ASR) has been extensively investigated in the supervised setting.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Learning to Visually Navigate in Photorealistic Environments Without any Supervision

no code implementations10 Apr 2020 Lina Mezghani, Sainbayar Sukhbaatar, Arthur Szlam, Armand Joulin, Piotr Bojanowski

Learning to navigate in a realistic setting where an agent must rely solely on visual inputs is a challenging task, in part because the lack of position information makes it difficult to provide supervision during training.

Navigate Position

Training with Quantization Noise for Extreme Model Compression

4 code implementations ICLR 2021 Angela Fan, Pierre Stock, Benjamin Graham, Edouard Grave, Remi Gribonval, Herve Jegou, Armand Joulin

A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator.

Image Generation Model Compression

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

16 code implementations NeurIPS 2020 Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin

In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements much.

Contrastive Learning Data Augmentation +2

Target Conditioning for One-to-Many Generation

no code implementations Findings of the Association for Computational Linguistics 2020 Marie-Anne Lachaux, Armand Joulin, Guillaume Lample

In this paper, we propose to explicitly model this one-to-many mapping by conditioning the decoder of a NMT model on a latent variable that represents the domain of target sentences.

Machine Translation NMT +2

Beyond English-Centric Multilingual Machine Translation

7 code implementations21 Oct 2020 Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin

Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages.

Machine Translation Translation

Self-Supervised Pretraining of 3D Features on any Point-Cloud

1 code implementation ICCV 2021 Zaiwei Zhang, Rohit Girdhar, Armand Joulin, Ishan Misra

Pretraining on large labeled datasets is a prerequisite to achieve good performance in many computer vision tasks like 2D object recognition, video classification etc.

Object object-detection +4

Emerging Properties in Self-Supervised Vision Transformers

26 code implementations ICCV 2021 Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin

In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets).

Copy Detection Image Retrieval +7

XCiT: Cross-Covariance Image Transformers

11 code implementations NeurIPS 2021 Alaaeldin El-Nouby, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Hervé Jegou

We propose a "transposed" version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries.

Instance Segmentation object-detection +3

Contrastive Pre-training for Zero-Shot Information Retrieval

no code implementations29 Sep 2021 Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave

By contrast, in many other NLP tasks, conventional self-supervised pre-training based on masking leads to strong generalization with small number of training examples.

Contrastive Learning Fact Checking +3

Learning Co-segmentation by Segment Swapping for Retrieval and Discovery

1 code implementation29 Oct 2021 Xi Shen, Alexei A. Efros, Armand Joulin, Mathieu Aubry

The goal of this work is to efficiently identify visually similar patterns in images, e. g. identifying an artwork detail copied between an engraving and an oil painting, or recognizing parts of a night-time photograph visible in its daytime counterpart.

Graph Clustering Object Discovery +3

Unsupervised Dense Information Retrieval with Contrastive Learning

6 code implementations16 Dec 2021 Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave

In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings.

Contrastive Learning Cross-Lingual Transfer +4

Detecting Twenty-thousand Classes using Image-level Supervision

1 code implementation7 Jan 2022 Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, Ishan Misra

For the first time, we train a detector with all the twenty-one-thousand classes of the ImageNet dataset and show that it generalizes to new datasets without finetuning.

Image Classification Open Vocabulary Object Detection

Omnivore: A Single Model for Many Visual Modalities

2 code implementations CVPR 2022 Rohit Girdhar, Mannat Singh, Nikhila Ravi, Laurens van der Maaten, Armand Joulin, Ishan Misra

Prior work has studied different visual modalities in isolation and developed separate architectures for recognition of images, videos, and 3D data.

 Ranked #1 on Scene Recognition on SUN-RGBD (using extra training data)

Action Classification Action Recognition +3

Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision

1 code implementation16 Feb 2022 Priya Goyal, Quentin Duval, Isaac Seessel, Mathilde Caron, Ishan Misra, Levent Sagun, Armand Joulin, Piotr Bojanowski

Discriminative self-supervised learning allows training models on any random group of internet images, and possibly recover salient information that helps differentiate between the images.

 Ranked #1 on Copy Detection on Copydays strong subset (using extra training data)

Action Classification Action Recognition +12

OmniMAE: Single Model Masked Pretraining on Images and Videos

1 code implementation CVPR 2023 Rohit Girdhar, Alaaeldin El-Nouby, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra

Furthermore, this model can be learned by dropping 90% of the image and 95% of the video patches, enabling extremely fast training of huge model architectures.

Improving Wikipedia Verifiability with AI

1 code implementation8 Jul 2022 Fabio Petroni, Samuel Broscheit, Aleksandra Piktus, Patrick Lewis, Gautier Izacard, Lucas Hosseini, Jane Dwivedi-Yu, Maria Lomeli, Timo Schick, Pierre-Emmanuel Mazaré, Armand Joulin, Edouard Grave, Sebastian Riedel

Hence, maintaining and improving the quality of Wikipedia references is an important challenge and there is a pressing need for better tools to assist humans in this effort.

Citation Recommendation Fact Checking

Atlas: Few-shot Learning with Retrieval Augmented Language Models

1 code implementation5 Aug 2022 Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, Edouard Grave

Retrieval augmented models are known to excel at knowledge intensive tasks without the need for as many parameters, but it is unclear whether they work in few-shot settings.

Fact Checking Few-Shot Learning +6

ImageBind: One Embedding Space To Bind Them All

1 code implementation CVPR 2023 Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra

We show that all combinations of paired data are not necessary to train such a joint embedding, and only image-paired data is sufficient to bind the modalities together.

Cross-Modal Retrieval Retrieval +7

PaSS: Parallel Speculative Sampling

no code implementations22 Nov 2023 Giovanni Monea, Armand Joulin, Edouard Grave

As an alternative, we propose to use parallel decoding as a way to draft multiple tokens from a single model with no computational cost, nor the need for a second model.

Scalable Pre-training of Large Autoregressive Image Models

2 code implementations16 Jan 2024 Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar, Joshua M Susskind, Armand Joulin

Specifically, we highlight two key findings: (1) the performance of the visual features scale with both the model capacity and the quantity of data, (2) the value of the objective function correlates with the performance of the model on downstream tasks.

Ranked #332 on Image Classification on ImageNet (using extra training data)

Image Classification

Gemma: Open Models Based on Gemini Research and Technology

no code implementations13 Mar 2024 Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent SIfre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari, Charline Le Lan, Christopher A. Choquette-Choo, Clément Crepy, Daniel Cer, Daphne Ippolito, David Reid, Elena Buchatskaya, Eric Ni, Eric Noland, Geng Yan, George Tucker, George-Christian Muraru, Grigory Rozhdestvenskiy, Henryk Michalewski, Ian Tenney, Ivan Grishchenko, Jacob Austin, James Keeling, Jane Labanowski, Jean-Baptiste Lespiau, Jeff Stanway, Jenny Brennan, Jeremy Chen, Johan Ferret, Justin Chiu, Justin Mao-Jones, Katherine Lee, Kathy Yu, Katie Millican, Lars Lowe Sjoesund, Lisa Lee, Lucas Dixon, Machel Reid, Maciej Mikuła, Mateo Wirth, Michael Sharman, Nikolai Chinaev, Nithum Thain, Olivier Bachem, Oscar Chang, Oscar Wahltinez, Paige Bailey, Paul Michel, Petko Yotov, Pier Giuseppe Sessa, Rahma Chaabouni, Ramona Comanescu, Reena Jana, Rohan Anil, Ross Mcilroy, Ruibo Liu, Ryan Mullins, Samuel L Smith, Sebastian Borgeaud, Sertan Girgin, Sholto Douglas, Shree Pandya, Siamak Shakeri, Soham De, Ted Klimenko, Tom Hennigan, Vlad Feinberg, Wojciech Stokowiec, Yu-Hui Chen, Zafarali Ahmed, Zhitao Gong, Tris Warkentin, Ludovic Peran, Minh Giang, Clément Farabet, Oriol Vinyals, Jeff Dean, Koray Kavukcuoglu, Demis Hassabis, Zoubin Ghahramani, Douglas Eck, Joelle Barral, Fernando Pereira, Eli Collins, Armand Joulin, Noah Fiedel, Evan Senter, Alek Andreev, Kathleen Kenealy

This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models.

Cannot find the paper you are looking for? You can Submit a new open access paper.