no code implementations • 31 Oct 2024 • Mathilde Caron, Alireza Fathi, Cordelia Schmid, Ahmet Iscen
Web-scale visual entity recognition, the task of associating images with their corresponding entities within vast knowledge bases like Wikipedia, presents significant challenges due to the lack of clean, large-scale training data.
1 code implementation • 11 Sep 2024 • Alfonso Taboada Warmerdam, Mathilde Caron, Yuki M. Asano
We validate the usefulness of learning binary masks as a fine-tuning method on 8 datasets and 3 model architectures, and we demonstrate the effectiveness of SMNs in 3 label-efficient settings.
2 code implementations • CVPR 2024 • Mathilde Caron, Ahmet Iscen, Alireza Fathi, Cordelia Schmid
In this paper, we address web-scale visual entity recognition, specifically the task of mapping a given query image to one of the 6 million existing entities in Wikipedia.
no code implementations • 14 Dec 2023 • Vincent Tao Hu, Yunlu Chen, Mathilde Caron, Yuki M. Asano, Cees G. M. Snoek, Bjorn Ommer
However, recent studies have revealed that the feature representation derived from diffusion model itself is discriminative for numerous downstream tasks as well, which prompts us to propose a framework to extract guidance from, and specifically for, diffusion models.
no code implementations • 26 Oct 2023 • Roy Hirsch, Regev Cohen, Mathilde Caron, Tomer Golany, Daniel Freedman, Ehud Rivlin
A key element of computer-assisted surgery systems is phase recognition of surgical videos.
1 code implementation • 23 Aug 2023 • Roy Hirsch, Mathilde Caron, Regev Cohen, Amir Livne, Ron Shapiro, Tomer Golany, Roman Goldenberg, Daniel Freedman, Ehud Rivlin
To fully exploit the power of SSL, we create sizable unlabeled endoscopic video datasets for training MSNs.
Ranked #2 on Surgical phase recognition on Cholec80
2 code implementations • 12 Jul 2023 • Mostafa Dehghani, Basil Mustafa, Josip Djolonga, Jonathan Heek, Matthias Minderer, Mathilde Caron, Andreas Steiner, Joan Puigcerver, Robert Geirhos, Ibrahim Alabdulmohsin, Avital Oliver, Piotr Padlewski, Alexey Gritsenko, Mario Lučić, Neil Houlsby
The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution before processing them with computer vision models has not yet been successfully challenged.
no code implementations • 12 Jun 2023 • Ahmet Iscen, Mathilde Caron, Alireza Fathi, Cordelia Schmid
Contrastive image-text models such as CLIP form the building blocks of many state-of-the-art systems.
Ranked #3 on Fine-Grained Image Recognition on OVEN
1 code implementation • ICCV 2023 • Liliane Momeni, Mathilde Caron, Arsha Nagrani, Andrew Zisserman, Cordelia Schmid
Understanding verbs is crucial to modelling how people and objects interact with each other and the environment through space and time.
Ranked #19 on Zero-Shot Video Question Answer on NExT-QA
1 code implementation • 10 Feb 2023 • Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby
The scaling of Transformers has driven breakthrough capabilities for language models.
Ranked #1 on Zero-Shot Transfer Image Classification on ObjectNet
4 code implementations • CVPR 2023 • Lucas Beyer, Pavel Izmailov, Alexander Kolesnikov, Mathilde Caron, Simon Kornblith, Xiaohua Zhai, Matthias Minderer, Michael Tschannen, Ibrahim Alabdulmohsin, Filip Pavetic
Vision Transformers convert images to sequences by slicing them into patches.
1 code implementation • 5 Dec 2022 • Mathilde Caron, Neil Houlsby, Cordelia Schmid
Pixel-level labels are particularly expensive to acquire.
no code implementations • 10 Oct 2022 • Ahmet Iscen, Thomas Bird, Mathilde Caron, Alireza Fathi, Cordelia Schmid
We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from.
2 code implementations • 14 Apr 2022 • Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas
We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations.
Self-Supervised Image Classification Self-Supervised Learning +1
1 code implementation • 16 Feb 2022 • Priya Goyal, Quentin Duval, Isaac Seessel, Mathilde Caron, Ishan Misra, Levent Sagun, Armand Joulin, Piotr Bojanowski
Discriminative self-supervised learning allows training models on any random group of internet images, and possibly recover salient information that helps differentiate between the images.
Ranked #1 on Copy Detection on Copydays strong subset (using extra training data)
6 code implementations • 16 Dec 2021 • Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave
In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers and show that it leads to strong performance in various retrieval settings.
no code implementations • 29 Sep 2021 • Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave
By contrast, in many other NLP tasks, conventional self-supervised pre-training based on masking leads to strong generalization with small number of training examples.
11 code implementations • NeurIPS 2021 • Alaaeldin El-Nouby, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Hervé Jegou
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries.
Ranked #57 on Instance Segmentation on COCO minival
15 code implementations • NeurIPS 2021 • Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Gautier Izacard, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, Hervé Jégou
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification.
Ranked #1 on Image Classification on Certificate Verification
30 code implementations • ICCV 2021 • Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin
In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets).
Ranked #2 on Copy Detection on Copydays strong subset
4 code implementations • ICCV 2021 • Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Armand Joulin, Nicolas Ballas, Michael Rabbat
This paper proposes a novel method of learning by predicting view assignments with support samples (PAWS).
1 code implementation • 2 Mar 2021 • Priya Goyal, Mathilde Caron, Benjamin Lefaudeux, Min Xu, Pengchao Wang, Vivek Pai, Mannat Singh, Vitaliy Liptchinsky, Ishan Misra, Armand Joulin, Piotr Bojanowski
Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods.
Ranked #6 on Image Classification on Places205
Self-Supervised Image Classification Self-Supervised Learning +1
18 code implementations • NeurIPS 2020 • Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin
In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements much.
Ranked #9 on Image Classification on OmniBenchmark
no code implementations • 10 Jan 2020 • Mathilde Caron, Ari Morcos, Piotr Bojanowski, Julien Mairal, Armand Joulin
In this work, we investigate the use of standard pruning methods, developed primarily for supervised learning, for networks trained without labels (i. e. on self-supervised tasks).
no code implementations • 25 Sep 2019 • Mathilde Caron, Ari Morcos, Piotr Bojanowski, Julien Mairal, Armand Joulin
The lottery ticket hypothesis argues that neural networks contain sparse subnetworks, which, if appropriately initialized (the winning tickets), are capable of matching the accuracy of the full network when trained in isolation.
2 code implementations • ICCV 2019 • Mathilde Caron, Piotr Bojanowski, Julien Mairal, Armand Joulin
Our goal is to bridge the performance gap between unsupervised methods trained on curated data, which are costly to obtain, and massive raw datasets that are easily available.
Ranked #65 on Self-Supervised Image Classification on ImageNet (finetuned) (using extra training data)
9 code implementations • ECCV 2018 • Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze
In this work, we present DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features.
Ranked #5 on Unsupervised Semantic Segmentation on ImageNet-S-50