no code implementations • 31 Jul 2023 • Amir Bar, Florian Bordes, Assaf Shocher, Mahmoud Assran, Pascal Vincent, Nicolas Ballas, Trevor Darrell, Amir Globerson, Yann Lecun
Specifically, we condition the model on stochastic masked token positions to guide the model toward learning features that are more robust to location uncertainties.
6 code implementations • 14 Apr 2023 • Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski
The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision.
Ranked #1 on
Image Retrieval
on AmsterTime
(using extra training data)
3 code implementations • CVPR 2023 • Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann Lecun, Nicolas Ballas
This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations.
no code implementations • 13 Oct 2022 • Mahmoud Assran, Randall Balestriero, Quentin Duval, Florian Bordes, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Nicolas Ballas
A successful paradigm in representation learning is to perform self-supervised pretraining using tasks based on mini-batch statistics (e. g., SimCLR, VICReg, SwAV, MSN).
2 code implementations • 14 Apr 2022 • Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas
We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations.
Self-Supervised Image Classification
Self-Supervised Learning
+1
2 code implementations • ICLR 2022 • Paul-Aymeric McRae, Prasanna Parthasarathi, Mahmoud Assran, Sarath Chandar
Popular approaches for minimizing loss in data-driven learning often involve an abstraction or an explicit retention of the history of gradients for efficient parameter updates.
4 code implementations • ICCV 2021 • Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Armand Joulin, Nicolas Ballas, Michael Rabbat
This paper proposes a novel method of learning by predicting view assignments with support samples (PAWS).
no code implementations • 6 Oct 2020 • Shagun Sodhani, Olivier Delalleau, Mahmoud Assran, Koustuv Sinha, Nicolas Ballas, Michael Rabbat
Surprisingly, we find that even at moderate batch sizes, models trained with codistillation can perform as well as models trained with synchronous data-parallel methods, despite using a much weaker synchronization mechanism.
no code implementations • 24 Jun 2020 • Mahmoud Assran, Arda Aytekin, Hamid Feyzmahdavian, Mikael Johansson, Michael Rabbat
Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade.
2 code implementations • 18 Jun 2020 • Mahmoud Assran, Nicolas Ballas, Lluis Castrejon, Michael Rabbat
We investigate a strategy for improving the efficiency of contrastive learning of visual representations by leveraging a small amount of supervised information during pre-training.
no code implementations • ICML 2020 • Mahmoud Assran, Michael Rabbat
We study Nesterov's accelerated gradient method with constant step-size and momentum parameters in the stochastic approximation setting (unbiased gradients with bounded variance) and the finite-sum setting (where randomness is due to sampling mini-batches).
1 code implementation • NeurIPS 2019 • Mahmoud Assran, Joshua Romoff, Nicolas Ballas, Joelle Pineau, Michael Rabbat
We show that we can run several loosely coupled GALA agents in parallel on a single GPU and achieve significantly higher hardware utilization and frame-rates than vanilla A2C at comparable power draws.
2 code implementations • ICLR 2019 • Mahmoud Assran, Nicolas Loizou, Nicolas Ballas, Michael Rabbat
Distributed data-parallel algorithms aim to accelerate the training of deep neural networks by parallelizing the computation of large mini-batch gradient updates across multiple nodes.