2 code implementations • 12 Feb 2025 • Timothée Darcet, Federico Baldassarre, Maxime Oquab, Julien Mairal, Piotr Bojanowski
Masked Image Modeling (MIM) offers a promising approach to self-supervised representation learning, however existing MIM models still lag behind the state-of-the-art.
no code implementations • 20 Dec 2024 • Cijo Jose, Théo Moutakanni, Dahyun Kang, Federico Baldassarre, Timothée Darcet, Hu Xu, Daniel Li, Marc Szafraniec, Michaël Ramamonjisoa, Maxime Oquab, Oriane Siméoni, Huy V. Vo, Patrick Labatut, Piotr Bojanowski
Self-supervised visual foundation models produce powerful embeddings that achieve remarkable performance on a wide range of downstream tasks.
Open Vocabulary Semantic Segmentation
Open-Vocabulary Semantic Segmentation
+2
no code implementations • 13 Jun 2024 • Théo Moutakanni, Maxime Oquab, Marc Szafraniec, Maria Vakalopoulou, Piotr Bojanowski
Self-Supervised learning (SSL) with Joint-Embedding Architectures (JEA) has led to outstanding performances.
1 code implementation • 24 May 2024 • Huy V. Vo, Vasil Khalidov, Timothée Darcet, Théo Moutakanni, Nikita Smetanin, Marc Szafraniec, Hugo Touvron, Camille Couprie, Maxime Oquab, Armand Joulin, Hervé Jégou, Patrick Labatut, Piotr Bojanowski
This manual process has some limitations similar to those encountered in supervised learning, e. g., the crowd-sourced selection of data is costly and time-consuming, preventing scaling the dataset size.
no code implementations • 2 May 2024 • Théo Moutakanni, Piotr Bojanowski, Guillaume Chassagnon, Céline Hudelot, Armand Joulin, Yann Lecun, Matthew Muckley, Maxime Oquab, Marie-Pierre Revel, Maria Vakalopoulou
AI Foundation models are gaining traction in various applications, including medical fields like radiology.
5 code implementations • 28 Sep 2023 • Timothée Darcet, Maxime Oquab, Julien Mairal, Piotr Bojanowski
Transformers have recently emerged as a powerful tool for learning visual representations.
Ranked #1 on
Self-Supervised Image Classification
on ImageNet
(using extra training data)
no code implementations • 18 Apr 2023 • Lina Mezghani, Piotr Bojanowski, Karteek Alahari, Sainbayar Sukhbaatar
The success of transformer models trained with a language modeling objective brings a promising opportunity to the reinforcement learning framework.
1 code implementation • 14 Apr 2023 • Jamie Tolan, Hung-I Yang, Ben Nosarzewski, Guillaume Couairon, Huy Vo, John Brandt, Justine Spore, Sayantan Majumdar, Daniel Haziza, Janaki Vamaraju, Theo Moutakanni, Piotr Bojanowski, Tracy Johns, Brian White, Tobias Tiecke, Camille Couprie
The maps are generated by the extraction of features from a self-supervised model trained on Maxar imagery from 2017 to 2020, and the training of a dense prediction decoder against aerial lidar maps.
22 code implementations • 14 Apr 2023 • Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski
The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision.
Ranked #1 on
Image Retrieval
on AmsterTime
(using extra training data)
7 code implementations • CVPR 2023 • Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann Lecun, Nicolas Ballas
This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations.
1 code implementation • 5 Jan 2023 • Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Alessandro Lazaric, Karteek Alahari
Developing agents that can execute multiple skills by learning from pre-collected datasets is an important problem in robotics, where online interaction with the environment is extremely time-consuming.
1 code implementation • CVPR 2023 • Hugo Touvron, Matthieu Cord, Maxime Oquab, Piotr Bojanowski, Jakob Verbeek, Hervé Jégou
Given a neural network to be trained, for each sample we implicitly instantiate two altered networks, "submodels", with stochastic depth: i. e. activating only a subset of the layers and skipping others.
1 code implementation • 9 Dec 2022 • Hugo Touvron, Matthieu Cord, Maxime Oquab, Piotr Bojanowski, Jakob Verbeek, Hervé Jégou
We introduce submodel co-training, a regularization method related to co-training, self-distillation and stochastic depth.
Ranked #63 on
Image Classification
on ImageNet
1 code implementation • 13 Oct 2022 • Mahmoud Assran, Randall Balestriero, Quentin Duval, Florian Bordes, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Nicolas Ballas
A successful paradigm in representation learning is to perform self-supervised pretraining using tasks based on mini-batch statistics (e. g., SimCLR, VICReg, SwAV, MSN).
no code implementations • 23 Jun 2022 • Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Karteek Alahari
Finally, we train a goal-conditioned policy network with goals sampled from the goal memory and reward it by the reachability network and the goal memory.
2 code implementations • 14 Apr 2022 • Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas
We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations.
Self-Supervised Image Classification
Self-Supervised Learning
+1
1 code implementation • 16 Feb 2022 • Priya Goyal, Quentin Duval, Isaac Seessel, Mathilde Caron, Ishan Misra, Levent Sagun, Armand Joulin, Piotr Bojanowski
Discriminative self-supervised learning allows training models on any random group of internet images, and possibly recover salient information that helps differentiate between the images.
Ranked #1 on
Copy Detection
on Copydays strong subset
(using extra training data)
5 code implementations • 27 Dec 2021 • Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Piotr Bojanowski, Armand Joulin, Gabriel Synnaeve, Hervé Jégou
We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning.
Ranked #41 on
Semantic Segmentation
on ADE20K val
6 code implementations • 16 Dec 2021 • Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave
In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings.
Ranked #5 on
Sentence Retrieval
on PeerQA
no code implementations • 29 Sep 2021 • Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave
By contrast, in many other NLP tasks, conventional self-supervised pre-training based on masking leads to strong generalization with small number of training examples.
12 code implementations • NeurIPS 2021 • Alaaeldin El-Nouby, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Hervé Jegou
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries.
Ranked #58 on
Instance Segmentation
on COCO minival
19 code implementations • NeurIPS 2021 • Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Gautier Izacard, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, Hervé Jégou
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification.
Ranked #1 on
Image Classification
on Certificate Verification
32 code implementations • ICCV 2021 • Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin
In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets).
Ranked #2 on
Copy Detection
on Copydays strong subset
4 code implementations • ICCV 2021 • Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Armand Joulin, Nicolas Ballas, Michael Rabbat
This paper proposes a novel method of learning by predicting view assignments with support samples (PAWS).
1 code implementation • 2 Mar 2021 • Priya Goyal, Mathilde Caron, Benjamin Lefaudeux, Min Xu, Pengchao Wang, Vivek Pai, Mannat Singh, Vitaliy Liptchinsky, Ishan Misra, Armand Joulin, Piotr Bojanowski
Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods.
Ranked #6 on
Image Classification
on Places205
Self-Supervised Image Classification
Self-Supervised Learning
+1
1 code implementation • 13 Jan 2021 • Lina Mezghani, Sainbayar Sukhbaatar, Thibaut Lavril, Oleksandr Maksymets, Dhruv Batra, Piotr Bojanowski, Karteek Alahari
In this work, we present a memory-augmented approach for image-goal navigation.
18 code implementations • NeurIPS 2020 • Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin
In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements much.
Ranked #9 on
Image Classification
on OmniBenchmark
no code implementations • 10 Apr 2020 • Lina Mezghani, Sainbayar Sukhbaatar, Arthur Szlam, Armand Joulin, Piotr Bojanowski
Learning to navigate in a realistic setting where an agent must rely solely on visual inputs is a challenging task, in part because the lack of position information makes it difficult to provide supervision during training.
no code implementations • 10 Jan 2020 • Mathilde Caron, Ari Morcos, Piotr Bojanowski, Julien Mairal, Armand Joulin
In this work, we investigate the use of standard pruning methods, developed primarily for supervised learning, for networks trained without labels (i. e. on self-supervised tasks).
no code implementations • 14 Oct 2019 • Piotr Bojanowski, Onur Celebi, Tomas Mikolov, Edouard Grave, Armand Joulin
In this paper, we focus on the problem of adapting word vector-based models to new textual data.
no code implementations • 25 Sep 2019 • Mathilde Caron, Ari Morcos, Piotr Bojanowski, Julien Mairal, Armand Joulin
The lottery ticket hypothesis argues that neural networks contain sparse subnetworks, which, if appropriately initialized (the winning tickets), are capable of matching the accuracy of the full network when trained in isolation.
no code implementations • ACL 2019 • Edouard Grave, Sainbayar Sukhbaatar, Piotr Bojanowski, Arm Joulin,
In this paper, we study the problem of hybrid language modeling, that is using models which can predict both characters and larger units such as character ngrams or words.
2 code implementations • NAACL 2019 • Bora Edizel, Aleksandra Piktus, Piotr Bojanowski, Rui Ferreira, Edouard Grave, Fabrizio Silvestri
In this paper we present a method to learn word embeddings that are resilient to misspellings.
8 code implementations • ACL 2019 • Sainbayar Sukhbaatar, Edouard Grave, Piotr Bojanowski, Armand Joulin
We propose a novel self-attention mechanism that can learn its optimal attention span.
Ranked #4 on
Language Modelling
on Text8
2 code implementations • ICCV 2019 • Mathilde Caron, Piotr Bojanowski, Julien Mairal, Armand Joulin
Our goal is to bridge the performance gap between unsupervised methods trained on curated data, which are costly to obtain, and massive raw datasets that are easily available.
Ranked #65 on
Self-Supervised Image Classification
on ImageNet (finetuned)
(using extra training data)
9 code implementations • ECCV 2018 • Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze
In this work, we present DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features.
Ranked #5 on
Unsupervised Semantic Segmentation
on ImageNet-S-50
4 code implementations • EMNLP 2018 • Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Herve Jegou, Edouard Grave
Continuous word representations learned separately on distinct languages can be aligned so that their words become comparable in a common space.
2 code implementations • NAACL 2018 • Kristina Gulordava, Piotr Bojanowski, Edouard Grave, Tal Linzen, Marco Baroni
Recurrent neural networks (RNNs) have achieved impressive results in a variety of linguistic processing tasks, suggesting that they can induce non-trivial properties of language.
2 code implementations • LREC 2018 • Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, Tomas Mikolov
Distributed word representations, or word vectors, have recently been applied to many tasks in natural language processing, leading to state-of-the-art performance.
Ranked #12 on
Only Connect Walls Dataset Task 1 (Grouping)
on OCW
(using extra training data)
5 code implementations • LREC 2018 • Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, Armand Joulin
Many Natural Language Processing applications nowadays rely on pre-trained word representations estimated from large text corpora such as news collections, Wikipedia and Web Crawl.
1 code implementation • 30 Oct 2017 • Armand Joulin, Edouard Grave, Piotr Bojanowski, Maximilian Nickel, Tomas Mikolov
This paper shows that a simple baseline based on a Bag-of-Words (BoW) representation learns surprisingly good knowledge graph embeddings.
2 code implementations • ICCV 2017 • Antoine Miech, Jean-Baptiste Alayrac, Piotr Bojanowski, Ivan Laptev, Josef Sivic
Discriminative clustering has been successfully applied to a number of weakly-supervised learning tasks.
Ranked #35 on
Video Retrieval
on LSMDC
6 code implementations • ICML 2018 • Piotr Bojanowski, Armand Joulin, David Lopez-Paz, Arthur Szlam
Generative Adversarial Networks (GANs) have achieved remarkable results in the task of generating realistic natural images.
1 code implementation • ICML 2017 • Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, Nicolas Usunier
We introduce Parseval networks, a form of deep neural networks in which the Lipschitz constant of linear, convolutional and aggregation layers is constrained to be smaller than 1.
1 code implementation • ICML 2017 • Piotr Bojanowski, Armand Joulin
We propose to fix a set of target representations, called Noise As Targets (NAT), and to constrain the deep features to align to them.
44 code implementations • 12 Dec 2016 • Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, Tomas Mikolov
We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory.
54 code implementations • TACL 2017 • Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov
A vector representation is associated to each character $n$-gram; words being represented as the sum of these representations.
Ranked #3 on
Word Similarity
on WS353
65 code implementations • EACL 2017 • Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov
This paper explores a simple and efficient baseline for text classification.
Ranked #1 on
Sentiment Analysis
on Sogou News
Emotion Recognition in Conversation
General Classification
+2
no code implementations • CVPR 2016 • Guillaume Seguin, Piotr Bojanowski, Remi Lajugie, Ivan Laptev
We address the problem of segmenting multiple object instances in complex videos.
1 code implementation • 19 Nov 2015 • Piotr Bojanowski, Armand Joulin, Tomas Mikolov
The first one consists on conditioning the character level representation on the previous word representation.
no code implementations • CVPR 2016 • Jean-Baptiste Alayrac, Piotr Bojanowski, Nishant Agrawal, Josef Sivic, Ivan Laptev, Simon Lacoste-Julien
Third, we experimentally demonstrate that the proposed method can automatically discover, in an unsupervised manner, the main steps to achieve the task and locate the steps in the input videos.
Ranked #7 on
Temporal Action Localization
on CrossTask
no code implementations • 5 Jun 2015 • Rémi Lajugie, Piotr Bojanowski, Sylvain Arlot, Francis Bach
In this paper, we address the problem of multi-label classification.
no code implementations • ICCV 2015 • Piotr Bojanowski, Rémi Lajugie, Edouard Grave, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid
Given vectorial features for both video and text, we propose to cast this task as a temporal assignment problem, with an implicit linear mapping between the two feature modalities.
no code implementations • 4 Jul 2014 • Piotr Bojanowski, Rémi Lajugie, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid, Josef Sivic
We are given a set of video clips, each one annotated with an {\em ordered} list of actions, such as "walk" then "sit" then "answer phone" extracted from, for example, the associated text script.