1 code implementation • CVPR 2023 • Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra
We show that all combinations of paired data are not necessary to train such a joint embedding, and only image-paired data is sufficient to bind the modalities together.
Ranked #2 on
Zero-shot Audio Classification
on AudioSet
(using extra training data)
6 code implementations • 14 Apr 2023 • Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski
The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision.
Ranked #1 on
Image Retrieval
on AmsterTime
(using extra training data)
no code implementations • 10 Apr 2023 • João Maria Janeiro, Stanislav Frolov, Alaaeldin El-Nouby, Jakob Verbeek
For example, for segmentation mIoU is reduced from 44. 5 to 30. 5 mIoU when compressing to 0. 1 bpp using the best compression model we evaluated.
no code implementations • 26 Jan 2023 • Matthew J. Muckley, Alaaeldin El-Nouby, Karen Ullrich, Hervé Jégou, Jakob Verbeek
Lossy image compression aims to represent images in as few bits as possible while maintaining fidelity to the original.
no code implementations • 14 Dec 2022 • Alaaeldin El-Nouby, Matthew J. Muckley, Karen Ullrich, Ivan Laptev, Jakob Verbeek, Hervé Jégou
In this work, we attempt to bring these lines of research closer by revisiting vector quantization for image compression.
1 code implementation • CVPR 2023 • Rohit Girdhar, Alaaeldin El-Nouby, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra
Furthermore, this model can be learned by dropping 90% of the image and 95% of the video patches, enabling extremely fast training of huge model architectures.
4 code implementations • 18 Mar 2022 • Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Jakob Verbeek, Hervé Jégou
(2) Fine-tuning the weights of the attention layers is sufficient to adapt vision transformers to a higher resolution and to other classification tasks.
Ranked #8 on
Image Classification
on CIFAR-10
5 code implementations • 27 Dec 2021 • Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Piotr Bojanowski, Armand Joulin, Gabriel Synnaeve, Hervé Jégou
We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning.
Ranked #37 on
Semantic Segmentation
on ADE20K val
no code implementations • 20 Dec 2021 • Alaaeldin El-Nouby, Gautier Izacard, Hugo Touvron, Ivan Laptev, Hervé Jegou, Edouard Grave
Our study shows that denoising autoencoders, such as BEiT or a variant that we introduce in this paper, are more robust to the type and size of the pre-training data than popular self-supervised methods trained by comparing image embeddings. We obtain competitive performance compared to ImageNet pre-training on a variety of classification datasets, from different domains.
11 code implementations • NeurIPS 2021 • Alaaeldin El-Nouby, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Hervé Jegou
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries.
Ranked #54 on
Instance Segmentation
on COCO minival
16 code implementations • NeurIPS 2021 • Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Gautier Izacard, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, Hervé Jégou
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification.
Ranked #5 on
Image Classification
on ImageNet ReaL
(Top 1 Accuracy metric)
11 code implementations • ICCV 2021 • Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze
We design a family of image classification architectures that optimize the trade-off between accuracy and efficiency in a high-speed regime.
Ranked #11 on
Image Classification
on iNaturalist 2019
1 code implementation • 10 Feb 2021 • Alaaeldin El-Nouby, Natalia Neverova, Ivan Laptev, Hervé Jégou
Transformers have shown outstanding results for natural language understanding and, more recently, for image classification.
no code implementations • 28 Oct 2019 • Alaaeldin El-Nouby, Shuangfei Zhai, Graham W. Taylor, Joshua M. Susskind
Deep neural networks require collecting and annotating large amounts of data to train successfully.
Ranked #43 on
Self-Supervised Action Recognition
on UCF101
3 code implementations • ICCV 2019 • Alaaeldin El-Nouby, Shikhar Sharma, Hannes Schulz, Devon Hjelm, Layla El Asri, Samira Ebrahimi Kahou, Yoshua Bengio, Graham W. Taylor
Conditional text-to-image generation is an active area of research, with many possible applications.
Ranked #2 on
Text-to-Image Generation
on GeNeVA (i-CLEVR)
no code implementations • 23 Feb 2018 • Alaaeldin El-Nouby, Graham W. Taylor
Finally, for better network initialization, we transfer from the task of action recognition to action detection by pre-training our framework using the recently released large-scale Kinetics dataset.