no code implementations • ICLR 2019 • Jean Alaux, Edouard Grave, Marco Cuturi, Armand Joulin
This paper extends this line of work to the problem of aligning multiple languages to a common space.
1 code implementation • 31 Jul 2024 • Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, Nikola Momchev, Matt Hoffman, Shantanu Thakoor, Jean-bastien Grill, Behnam Neyshabur, Olivier Bachem, Alanna Walton, Aliaksei Severyn, Alicia Parrish, Aliya Ahmad, Allen Hutchison, Alvin Abdagic, Amanda Carl, Amy Shen, Andy Brock, Andy Coenen, Anthony Laforge, Antonia Paterson, Ben Bastian, Bilal Piot, Bo Wu, Brandon Royal, Charlie Chen, Chintu Kumar, Chris Perry, Chris Welty, Christopher A. Choquette-Choo, Danila Sinopalnikov, David Weinberger, Dimple Vijaykumar, Dominika Rogozińska, Dustin Herbison, Elisa Bandy, Emma Wang, Eric Noland, Erica Moreira, Evan Senter, Evgenii Eltyshev, Francesco Visin, Gabriel Rasskin, Gary Wei, Glenn Cameron, Gus Martins, Hadi Hashemi, Hanna Klimczak-Plucińska, Harleen Batra, Harsh Dhand, Ivan Nardini, Jacinda Mein, Jack Zhou, James Svensson, Jeff Stanway, Jetha Chan, Jin Peng Zhou, Joana Carrasqueira, Joana Iljazi, Jocelyn Becker, Joe Fernandez, Joost van Amersfoort, Josh Gordon, Josh Lipschultz, Josh Newlan, Ju-yeong Ji, Kareem Mohamed, Kartikeya Badola, Kat Black, Katie Millican, Keelin McDonell, Kelvin Nguyen, Kiranbir Sodhia, Kish Greene, Lars Lowe Sjoesund, Lauren Usui, Laurent SIfre, Lena Heuermann, Leticia Lago, Lilly McNealus, Livio Baldini Soares, Logan Kilpatrick, Lucas Dixon, Luciano Martins, Machel Reid, Manvinder Singh, Mark Iverson, Martin Görner, Mat Velloso, Mateo Wirth, Matt Davidow, Matt Miller, Matthew Rahtz, Matthew Watson, Meg Risdal, Mehran Kazemi, Michael Moynihan, Ming Zhang, Minsuk Kahng, Minwoo Park, Mofi Rahman, Mohit Khatwani, Natalie Dao, Nenshad Bardoliwalla, Nesh Devanathan, Neta Dumai, Nilay Chauhan, Oscar Wahltinez, Pankil Botarda, Parker Barnes, Paul Barham, Paul Michel, Pengchong Jin, Petko Georgiev, Phil Culliton, Pradeep Kuppala, Ramona Comanescu, Ramona Merhej, Reena Jana, Reza Ardeshir Rokni, Rishabh Agarwal, Ryan Mullins, Samaneh Saadat, Sara Mc Carthy, Sarah Cogan, Sarah Perrin, Sébastien M. R. Arnold, Sebastian Krause, Shengyang Dai, Shruti Garg, Shruti Sheth, Sue Ronstrom, Susan Chan, Timothy Jordan, Ting Yu, Tom Eccles, Tom Hennigan, Tomas Kocisky, Tulsee Doshi, Vihan Jain, Vikas Yadav, Vilobh Meshram, Vishal Dharmadhikari, Warren Barkley, Wei Wei, Wenming Ye, Woohyun Han, Woosuk Kwon, Xiang Xu, Zhe Shen, Zhitao Gong, Zichuan Wei, Victor Cotruta, Phoebe Kirk, Anand Rao, Minh Giang, Ludovic Peran, Tris Warkentin, Eli Collins, Joelle Barral, Zoubin Ghahramani, Raia Hadsell, D. Sculley, Jeanine Banks, Anca Dragan, Slav Petrov, Oriol Vinyals, Jeff Dean, Demis Hassabis, Koray Kavukcuoglu, Clement Farabet, Elena Buchatskaya, Sebastian Borgeaud, Noah Fiedel, Armand Joulin, Kathleen Kenealy, Robert Dadashi, Alek Andreev
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters.
no code implementations • 2 Jul 2024 • Xavier Suau, Pieter Delobelle, Katherine Metcalf, Armand Joulin, Nicholas Apostoloff, Luca Zappella, Pau Rodríguez
We also show that AurA is effective with models of different scale (from 1. 5B to 40B parameters), and its effectiveness in mitigating toxic language, while preserving common-sense zero-shot abilities, holds across all scales.
1 code implementation • 6 Jun 2024 • Xiou Ge, Ali Mousavi, Edouard Grave, Armand Joulin, Kun Qian, Benjamin Han, Mostafa Arefiyan, Yunyao Li
It is thus essential to design effective methods to both update obsolete knowledge and induce new knowledge into LLMs.
1 code implementation • 24 May 2024 • Huy V. Vo, Vasil Khalidov, Timothée Darcet, Théo Moutakanni, Nikita Smetanin, Marc Szafraniec, Hugo Touvron, Camille Couprie, Maxime Oquab, Armand Joulin, Hervé Jégou, Patrick Labatut, Piotr Bojanowski
This manual process has some limitations similar to those encountered in supervised learning, e. g., the crowd-sourced selection of data is costly and time-consuming, preventing scaling the dataset size.
no code implementations • 2 May 2024 • Théo Moutakanni, Piotr Bojanowski, Guillaume Chassagnon, Céline Hudelot, Armand Joulin, Yann Lecun, Matthew Muckley, Maxime Oquab, Marie-Pierre Revel, Maria Vakalopoulou
AI Foundation models are gaining traction in various applications, including medical fields like radiology.
1 code implementation • 11 Apr 2024 • Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent SIfre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Armand Joulin, Noah Fiedel, Evan Senter, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, David Budden, Arnaud Doucet, Sharad Vikram, Adam Paszke, Trevor Gale, Sebastian Borgeaud, Charlie Chen, Andy Brock, Antonia Paterson, Jenny Brennan, Meg Risdal, Raj Gundluru, Nesh Devanathan, Paul Mooney, Nilay Chauhan, Phil Culliton, Luiz GUStavo Martins, Elisa Bandy, David Huntsperger, Glenn Cameron, Arthur Zucker, Tris Warkentin, Ludovic Peran, Minh Giang, Zoubin Ghahramani, Clément Farabet, Koray Kavukcuoglu, Demis Hassabis, Raia Hadsell, Yee Whye Teh, Nando de Frietas
We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture.
2 code implementations • 13 Mar 2024 • Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent SIfre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari, Charline Le Lan, Christopher A. Choquette-Choo, Clément Crepy, Daniel Cer, Daphne Ippolito, David Reid, Elena Buchatskaya, Eric Ni, Eric Noland, Geng Yan, George Tucker, George-Christian Muraru, Grigory Rozhdestvenskiy, Henryk Michalewski, Ian Tenney, Ivan Grishchenko, Jacob Austin, James Keeling, Jane Labanowski, Jean-Baptiste Lespiau, Jeff Stanway, Jenny Brennan, Jeremy Chen, Johan Ferret, Justin Chiu, Justin Mao-Jones, Katherine Lee, Kathy Yu, Katie Millican, Lars Lowe Sjoesund, Lisa Lee, Lucas Dixon, Machel Reid, Maciej Mikuła, Mateo Wirth, Michael Sharman, Nikolai Chinaev, Nithum Thain, Olivier Bachem, Oscar Chang, Oscar Wahltinez, Paige Bailey, Paul Michel, Petko Yotov, Rahma Chaabouni, Ramona Comanescu, Reena Jana, Rohan Anil, Ross Mcilroy, Ruibo Liu, Ryan Mullins, Samuel L Smith, Sebastian Borgeaud, Sertan Girgin, Sholto Douglas, Shree Pandya, Siamak Shakeri, Soham De, Ted Klimenko, Tom Hennigan, Vlad Feinberg, Wojciech Stokowiec, Yu-Hui Chen, Zafarali Ahmed, Zhitao Gong, Tris Warkentin, Ludovic Peran, Minh Giang, Clément Farabet, Oriol Vinyals, Jeff Dean, Koray Kavukcuoglu, Demis Hassabis, Zoubin Ghahramani, Douglas Eck, Joelle Barral, Fernando Pereira, Eli Collins, Armand Joulin, Noah Fiedel, Evan Senter, Alek Andreev, Kathleen Kenealy
This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models.
2 code implementations • 16 Jan 2024 • Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar, Joshua M Susskind, Armand Joulin
Specifically, we highlight two key findings: (1) the performance of the visual features scale with both the model capacity and the quantity of data, (2) the value of the objective function correlates with the performance of the model on downstream tasks.
Ranked #368 on
Image Classification
on ImageNet
no code implementations • 22 Nov 2023 • Giovanni Monea, Armand Joulin, Edouard Grave
As an alternative, we propose to use parallel decoding as a way to draft multiple tokens from a single model with no computational cost, nor the need for a second model.
3 code implementations • CVPR 2023 • Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra
We show that all combinations of paired data are not necessary to train such a joint embedding, and only image-paired data is sufficient to bind the modalities together.
Ranked #2 on
Zero-shot Classification (unified classes)
on LLVIP
21 code implementations • 14 Apr 2023 • Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski
The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision.
Ranked #1 on
Image Retrieval
on AmsterTime
(using extra training data)
1 code implementation • ICCV 2023 • Mannat Singh, Quentin Duval, Kalyan Vasudev Alwala, Haoqi Fan, Vaibhav Aggarwal, Aaron Adcock, Armand Joulin, Piotr Dollár, Christoph Feichtenhofer, Ross Girshick, Rohit Girdhar, Ishan Misra
While MAE has only been shown to scale with the size of models, we find that it scales with the size of the training dataset as well.
Ranked #1 on
Few-Shot Image Classification
on ImageNet - 10-shot
(using extra training data)
52 code implementations • arXiv 2023 • Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters.
Ranked #3 on
Few-Shot Learning
on MedConceptsQA
1 code implementation • 5 Aug 2022 • Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, Edouard Grave
Retrieval augmented models are known to excel at knowledge intensive tasks without the need for as many parameters, but it is unclear whether they work in few-shot settings.
Ranked #1 on
Question Answering
on Natural Questions
1 code implementation • 8 Jul 2022 • Fabio Petroni, Samuel Broscheit, Aleksandra Piktus, Patrick Lewis, Gautier Izacard, Lucas Hosseini, Jane Dwivedi-Yu, Maria Lomeli, Timo Schick, Pierre-Emmanuel Mazaré, Armand Joulin, Edouard Grave, Sebastian Riedel
Hence, maintaining and improving the quality of Wikipedia references is an important challenge and there is a pressing need for better tools to assist humans in this effort.
1 code implementation • CVPR 2023 • Rohit Girdhar, Alaaeldin El-Nouby, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra
Furthermore, this model can be learned by dropping 90% of the image and 95% of the video patches, enabling extremely fast training of huge model architectures.
2 code implementations • 14 Apr 2022 • Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas
We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations.
Self-Supervised Image Classification
Self-Supervised Learning
+1
1 code implementation • 16 Feb 2022 • Priya Goyal, Quentin Duval, Isaac Seessel, Mathilde Caron, Ishan Misra, Levent Sagun, Armand Joulin, Piotr Bojanowski
Discriminative self-supervised learning allows training models on any random group of internet images, and possibly recover salient information that helps differentiate between the images.
Ranked #1 on
Copy Detection
on Copydays strong subset
(using extra training data)
2 code implementations • CVPR 2022 • Rohit Girdhar, Mannat Singh, Nikhila Ravi, Laurens van der Maaten, Armand Joulin, Ishan Misra
Prior work has studied different visual modalities in isolation and developed separate architectures for recognition of images, videos, and 3D data.
Ranked #1 on
Scene Recognition
on SUN-RGBD
(using extra training data)
1 code implementation • 7 Jan 2022 • Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, Ishan Misra
For the first time, we train a detector with all the twenty-one-thousand classes of the ImageNet dataset and show that it generalizes to new datasets without finetuning.
Ranked #1 on
Cross-Domain Few-Shot Object Detection
on NEU-DET
Cross-Domain Few-Shot Object Detection
Image Classification
+1
5 code implementations • 27 Dec 2021 • Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Piotr Bojanowski, Armand Joulin, Gabriel Synnaeve, Hervé Jégou
We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning.
Ranked #40 on
Semantic Segmentation
on ADE20K val
6 code implementations • 16 Dec 2021 • Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave
In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings.
1 code implementation • 29 Oct 2021 • Xi Shen, Alexei A. Efros, Armand Joulin, Mathieu Aubry
The goal of this work is to efficiently identify visually similar patterns in images, e. g. identifying an artwork detail copied between an engraving and an oil painting, or recognizing parts of a night-time photograph visible in its daytime counterpart.
no code implementations • 29 Sep 2021 • Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave
By contrast, in many other NLP tasks, conventional self-supervised pre-training based on masking leads to strong generalization with small number of training examples.
1 code implementation • ICCV 2021 • Ishan Misra, Rohit Girdhar, Armand Joulin
We propose 3DETR, an end-to-end Transformer based object detection model for 3D point clouds.
Ranked #20 on
3D Object Detection
on SUN-RGBD val
11 code implementations • NeurIPS 2021 • Alaaeldin El-Nouby, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Hervé Jegou
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries.
Ranked #58 on
Instance Segmentation
on COCO minival
17 code implementations • NeurIPS 2021 • Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Gautier Izacard, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, Hervé Jégou
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification.
Ranked #1 on
Image Classification
on Certificate Verification
31 code implementations • ICCV 2021 • Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin
In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets).
Ranked #2 on
Copy Detection
on Copydays strong subset
4 code implementations • ICCV 2021 • Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Armand Joulin, Nicolas Ballas, Michael Rabbat
This paper proposes a novel method of learning by predicting view assignments with support samples (PAWS).
11 code implementations • ICCV 2021 • Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze
We design a family of image classification architectures that optimize the trade-off between accuracy and efficiency in a high-speed regime.
Ranked #15 on
Image Classification
on Stanford Cars
1 code implementation • 2 Mar 2021 • Priya Goyal, Mathilde Caron, Benjamin Lefaudeux, Min Xu, Pengchao Wang, Vivek Pai, Mannat Singh, Vitaliy Liptchinsky, Ishan Misra, Armand Joulin, Piotr Bojanowski
Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods.
Ranked #6 on
Image Classification
on Places205
Self-Supervised Image Classification
Self-Supervised Learning
+1
1 code implementation • ICCV 2021 • Zaiwei Zhang, Rohit Girdhar, Armand Joulin, Ishan Misra
Pretraining on large labeled datasets is a prerequisite to achieve good performance in many computer vision tasks like 2D object recognition, video classification etc.
8 code implementations • 21 Oct 2020 • Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin
Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Marie-Anne Lachaux, Armand Joulin, Guillaume Lample
In this paper, we propose to explicitly model this one-to-many mapping by conditioning the decoder of a NMT model on a latent variable that represents the domain of target sentences.
18 code implementations • NeurIPS 2020 • Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin
In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements much.
Ranked #9 on
Image Classification
on OmniBenchmark
4 code implementations • ICLR 2021 • Angela Fan, Pierre Stock, Benjamin Graham, Edouard Grave, Remi Gribonval, Herve Jegou, Armand Joulin
A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator.
no code implementations • 10 Apr 2020 • Lina Mezghani, Sainbayar Sukhbaatar, Arthur Szlam, Armand Joulin, Piotr Bojanowski
Learning to navigate in a realistic setting where an agent must rely solely on visual inputs is a challenging task, in part because the lack of position information makes it difficult to provide supervision during training.
4 code implementations • 21 Feb 2020 • Angela Fan, Thibaut Lavril, Edouard Grave, Armand Joulin, Sainbayar Sukhbaatar
Transformers have been successfully applied to sequential, auto-regressive tasks despite being feedforward networks.
3 code implementations • 7 Feb 2020 • Morgane Rivière, Armand Joulin, Pierre-Emmanuel Mazaré, Emmanuel Dupoux
Cross-lingual and multi-lingual training of Automatic Speech Recognition (ASR) has been extensively investigated in the supervised setting.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 10 Jan 2020 • Mathilde Caron, Ari Morcos, Piotr Bojanowski, Julien Mairal, Armand Joulin
In this work, we investigate the use of standard pruning methods, developed primarily for supervised learning, for networks trained without labels (i. e. on self-supervised tasks).
2 code implementations • 17 Dec 2019 • Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdel-rahman Mohamed, Emmanuel Dupoux
Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER).
Ranked #1 on
Speech Recognition
on Libri-Light test-other
(ABX-within metric)
3 code implementations • ACL 2021 • Holger Schwenk, Guillaume Wenzek, Sergey Edunov, Edouard Grave, Armand Joulin
To evaluate the quality of the mined bitexts, we train NMT systems for most of the language pairs and evaluate them on TED, WMT and WAT test sets.
2 code implementations • LREC 2020 • Guillaume Wenzek, Marie-Anne Lachaux, Alexis Conneau, Vishrav Chaudhary, Francisco Guzmán, Armand Joulin, Edouard Grave
Pre-training text representations have led to significant improvements in many areas of natural language processing.
no code implementations • 14 Oct 2019 • Piotr Bojanowski, Onur Celebi, Tomas Mikolov, Edouard Grave, Armand Joulin
In this paper, we focus on the problem of adapting word vector-based models to new textual data.
5 code implementations • ICLR 2020 • Angela Fan, Edouard Grave, Armand Joulin
Overparameterized transformer networks have obtained state of the art results in various natural language processing tasks, such as machine translation, language modeling, and question answering.
Ranked #5 on
Open-Domain Question Answering
on ELI5
no code implementations • 25 Sep 2019 • Mathilde Caron, Ari Morcos, Piotr Bojanowski, Julien Mairal, Armand Joulin
The lottery ticket hypothesis argues that neural networks contain sparse subnetworks, which, if appropriately initialized (the winning tickets), are capable of matching the accuracy of the full network when trained in isolation.
no code implementations • 25 Sep 2019 • Gautier Izacard, Armand Joulin, Edouard Grave
It is closely related to the problem of online learning of language models.
1 code implementation • 22 Jul 2019 • Arthur Szlam, Jonathan Gray, Kavya Srinet, Yacine Jernite, Armand Joulin, Gabriel Synnaeve, Douwe Kiela, Haonan Yu, Zhuoyuan Chen, Siddharth Goyal, Demi Guo, Danielle Rothermel, C. Lawrence Zitnick, Jason Weston
In this document we describe a rationale for a research program aimed at building an open "assistant" in the game Minecraft, in order to make progress on the problems of natural language understanding and learning from dialogue.
3 code implementations • ICLR 2020 • Pierre Stock, Armand Joulin, Rémi Gribonval, Benjamin Graham, Hervé Jégou
In this paper, we address the problem of reducing the memory footprint of convolutional network architectures.
2 code implementations • 2 Jul 2019 • Sainbayar Sukhbaatar, Edouard Grave, Guillaume Lample, Herve Jegou, Armand Joulin
More precisely, we augment the self-attention layers with persistent memory vectors that play a similar role as the feed-forward layer.
Ranked #5 on
Language Modelling
on Text8
8 code implementations • ACL 2019 • Sainbayar Sukhbaatar, Edouard Grave, Piotr Bojanowski, Armand Joulin
We propose a novel self-attention mechanism that can learn its optimal attention span.
Ranked #4 on
Language Modelling
on Text8
2 code implementations • ICCV 2019 • Mathilde Caron, Piotr Bojanowski, Julien Mairal, Armand Joulin
Our goal is to bridge the performance gap between unsupervised methods trained on curated data, which are costly to obtain, and massive raw datasets that are easily available.
Ranked #65 on
Self-Supervised Image Classification
on ImageNet (finetuned)
(using extra training data)
1 code implementation • NAACL 2019 • Serhii Havrylov, Germán Kruszewski, Armand Joulin
There has been considerable attention devoted to models that learn to jointly infer an expression's syntactic structure and its semantics.
no code implementations • 2 Nov 2018 • Jean Alaux, Edouard Grave, Marco Cuturi, Armand Joulin
This paper extends this line of work to the problem of aligning multiple languages to a common space.
9 code implementations • ECCV 2018 • Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze
In this work, we present DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features.
Ranked #5 on
Unsupervised Semantic Segmentation
on ImageNet-S-50
3 code implementations • 29 May 2018 • Edouard Grave, Armand Joulin, Quentin Berthet
A library for Multilingual Unsupervised or Supervised word Embeddings
4 code implementations • EMNLP 2018 • Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Herve Jegou, Edouard Grave
Continuous word representations learned separately on distinct languages can be aligned so that their words become comparable in a common space.
2 code implementations • LREC 2018 • Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, Tomas Mikolov
Distributed word representations, or word vectors, have recently been applied to many tasks in natural language processing, leading to state-of-the-art performance.
Ranked #12 on
Only Connect Walls Dataset Task 1 (Grouping)
on OCW
(using extra training data)
5 code implementations • LREC 2018 • Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, Armand Joulin
Many Natural Language Processing applications nowadays rely on pre-trained word representations estimated from large text corpora such as news collections, Wikipedia and Web Crawl.
2 code implementations • NeurIPS 2017 • Edouard Grave, Moustapha Cisse, Armand Joulin
Recently, continuous cache models were proposed as extensions to recurrent neural network language models, to adapt their predictions to local changes in the data distribution.
1 code implementation • 30 Oct 2017 • Armand Joulin, Edouard Grave, Piotr Bojanowski, Maximilian Nickel, Tomas Mikolov
This paper shows that a simple baseline based on a Bag-of-Words (BoW) representation learns surprisingly good knowledge graph embeddings.
6 code implementations • ICML 2018 • Piotr Bojanowski, Armand Joulin, David Lopez-Paz, Arthur Szlam
Generative Adversarial Networks (GANs) have achieved remarkable results in the task of generating realistic natural images.
1 code implementation • ICML 2017 • Piotr Bojanowski, Armand Joulin
We propose to fix a set of target representations, called Noise As Targets (NAT), and to constrain the deep features to align to them.
no code implementations • 31 Jan 2017 • Marco Baroni, Armand Joulin, Allan Jabri, Germàn Kruszewski, Angeliki Lazaridou, Klemen Simonic, Tomas Mikolov
With machine learning successfully applied to new daunting problems almost every day, general AI starts looking like an attainable goal.
no code implementations • ICCV 2017 • Ang Li, Allan Jabri, Armand Joulin, Laurens van der Maaten
Real-world image recognition systems need to recognize tens of thousands of classes that constitute a plethora of visual concepts.
Ranked #2 on
Zero-Shot Transfer Image Classification
on aYahoo
14 code implementations • 13 Dec 2016 • Edouard Grave, Armand Joulin, Nicolas Usunier
We propose an extension to neural network language models to adapt their prediction to the recent history.
Ranked #33 on
Language Modelling
on WikiText-2
43 code implementations • 12 Dec 2016 • Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, Tomas Mikolov
We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory.
no code implementations • 18 Nov 2016 • Yacine Jernite, Edouard Grave, Armand Joulin, Tomas Mikolov
Recurrent neural networks (RNNs) have been used extensively and with increasing success to model various types of sequential data.
12 code implementations • ICML 2017 • Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, Hervé Jégou
We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies.
53 code implementations • TACL 2017 • Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov
A vector representation is associated to each character $n$-gram; words being represented as the sum of these representations.
Ranked #3 on
Word Similarity
on WS353
64 code implementations • EACL 2017 • Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov
This paper explores a simple and efficient baseline for text classification.
Ranked #1 on
Sentiment Analysis
on Sogou News
Emotion Recognition in Conversation
General Classification
+2
3 code implementations • 27 Jun 2016 • Allan Jabri, Armand Joulin, Laurens van der Maaten
Visual question answering (VQA) is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image understanding.
no code implementations • 7 Jun 2016 • Marius Cătălin Iordan, Armand Joulin, Diane M. Beck, Li Fei-Fei
Our method outperforms the two most commonly used alternatives (anatomical landmark-based AFNI alignment and cortical convexity-based FreeSurfer alignment) in overlap between predicted region and functionally-defined LOC.
1 code implementation • 25 Nov 2015 • Tomas Mikolov, Armand Joulin, Marco Baroni
The development of intelligent machines is one of the biggest unsolved challenges in computer science.
1 code implementation • 23 Nov 2015 • Wojciech Zaremba, Tomas Mikolov, Armand Joulin, Rob Fergus
We present an approach for learning simple algorithms such as copying, multi-digit addition and single digit multiplication directly from examples.
1 code implementation • 19 Nov 2015 • Piotr Bojanowski, Armand Joulin, Tomas Mikolov
The first one consists on conditioning the character level representation on the previous word representation.
no code implementations • 6 Nov 2015 • Armand Joulin, Laurens van der Maaten, Allan Jabri, Nicolas Vasilache
We train convolutional networks on a dataset of 100 million Flickr photos and captions, and show that these networks produce features that perform well in a range of vision problems.
4 code implementations • NeurIPS 2015 • Armand Joulin, Tomas Mikolov
Despite the recent achievements in machine learning, we are still very far from achieving real artificial intelligence.
20 code implementations • 19 Feb 2015 • Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M. Rush, Bart van Merriënboer, Armand Joulin, Tomas Mikolov
One long-term goal of machine learning research is to produce methods that are applicable to reasoning and natural language, in particular building an intelligent dialogue agent.
5 code implementations • 24 Dec 2014 • Tomas Mikolov, Armand Joulin, Sumit Chopra, Michael Mathieu, Marc'Aurelio Ranzato
In this paper, we show that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent.
no code implementations • NeurIPS 2014 • Andrej Karpathy, Armand Joulin, Li Fei-Fei
We introduce a model for bidirectional retrieval of images and sentences through a multi-modal embedding of visual and natural language data.
Ranked #13 on
Referring Expression Comprehension
on Talk2Car
no code implementations • CVPR 2014 • Kevin Tang, Armand Joulin, Li-Jia Li, Li Fei-Fei
In this paper, we tackle the problem of co-localization in real-world images.
no code implementations • CVPR 2013 • Michael Rubinstein, Armand Joulin, Johannes Kopf, Ce Liu
In contrast to previous co-segmentation methods, our algorithm performs well even in the presence of significant amounts of noise images (images not containing a common object), as typical for datasets collected from Internet search.
no code implementations • CVPR 2013 • Armand Joulin, Sing Bing Kang
An anaglyph is a single image created by selecting complementary colors from a stereo color pair; the user can perceive depth by viewing it through color-filtered glasses.
no code implementations • NeurIPS 2010 • Armand Joulin, Jean Ponce, Francis R. Bach
To avoid this problem, we introduce a local approximation of this cost function, which leads to a quadratic non-convex optimization problem over a product of simplices.