Search Results for author: Alaaeldin El-Nouby

Found 17 papers, 11 papers with code

Scalable Pre-training of Large Autoregressive Image Models

2 code implementations • 16 Jan 2024 • Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar, Joshua M Susskind, Armand Joulin

Specifically, we highlight two key findings: (1) the performance of the visual features scale with both the model capacity and the quantity of data, (2) the value of the objective function correlates with the performance of the model on downstream tasks.

Ranked #333 on Image Classification on ImageNet (using extra training data)

Image Classification

2,742

Paper
Code

ImageBind: One Embedding Space To Bind Them All

1 code implementation • CVPR 2023 • Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra

We show that all combinations of paired data are not necessary to train such a joint embedding, and only image-paired data is sufficient to bind the modalities together.

Ranked #2 on Zero-shot Classification (unified classes) on LLVIP

Cross-Modal Retrieval Retrieval +7

7,865

Paper
Code

DINOv2: Learning Robust Visual Features without Supervision

11 code implementations • 14 Apr 2023 • Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski

The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision.

Ranked #1 on Image Classification on CIFAR-10

Domain Generalization Fine-Grained Image Classification +5

124,793

Paper
Code

Are Visual Recognition Models Robust to Image Compression?

no code implementations • 10 Apr 2023 • João Maria Janeiro, Stanislav Frolov, Alaaeldin El-Nouby, Jakob Verbeek

For example, for segmentation mIoU is reduced from 44. 5 to 30. 5 mIoU when compressing to 0. 1 bpp using the best compression model we evaluated.

Image Classification Image Compression +4

Paper
Add Code

Improving Statistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models

no code implementations • 26 Jan 2023 • Matthew J. Muckley, Alaaeldin El-Nouby, Karen Ullrich, Hervé Jégou, Jakob Verbeek

Lossy image compression aims to represent images in as few bits as possible while maintaining fidelity to the original.

Image Compression MS-SSIM +1

Paper
Add Code

Image Compression with Product Quantized Masked Image Modeling

no code implementations • 14 Dec 2022 • Alaaeldin El-Nouby, Matthew J. Muckley, Karen Ullrich, Ivan Laptev, Jakob Verbeek, Hervé Jégou

In this work, we attempt to bring these lines of research closer by revisiting vector quantization for image compression.

Image Compression Image Generation +3

Paper
Add Code

OmniMAE: Single Model Masked Pretraining on Images and Videos

1 code implementation • CVPR 2023 • Rohit Girdhar, Alaaeldin El-Nouby, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra

Furthermore, this model can be learned by dropping 90% of the image and 95% of the video patches, enabling extremely fast training of huge model architectures.

544

Paper
Code

Three things everyone should know about Vision Transformers

6 code implementations • 18 Mar 2022 • Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Jakob Verbeek, Hervé Jégou

(2) Fine-tuning the weights of the attention layers is sufficient to adapt vision transformers to a higher resolution and to other classification tasks.

Ranked #8 on Image Classification on CIFAR-10 (using extra training data)

Fine-Grained Image Classification

29,713

Paper
Code

Augmenting Convolutional networks with attention-based aggregation

5 code implementations • 27 Dec 2021 • Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Piotr Bojanowski, Armand Joulin, Gabriel Synnaeve, Hervé Jégou

We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning.

Ranked #38 on Semantic Segmentation on ADE20K val

Classification Image Classification +3

3,864

Paper
Code

Are Large-scale Datasets Necessary for Self-Supervised Pre-training?

no code implementations • 20 Dec 2021 • Alaaeldin El-Nouby, Gautier Izacard, Hugo Touvron, Ivan Laptev, Hervé Jegou, Edouard Grave

Our study shows that denoising autoencoders, such as BEiT or a variant that we introduce in this paper, are more robust to the type and size of the pre-training data than popular self-supervised methods trained by comparing image embeddings. We obtain competitive performance compared to ImageNet pre-training on a variety of classification datasets, from different domains.

Denoising Instance Segmentation +1

Paper
Add Code

XCiT: Cross-Covariance Image Transformers

11 code implementations • NeurIPS 2021 • Alaaeldin El-Nouby, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Hervé Jegou

We propose a "transposed" version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries.

Ranked #55 on Instance Segmentation on COCO minival

Instance Segmentation object-detection +3

29,713

Paper
Code

ResMLP: Feedforward networks for image classification with data-efficient training

15 code implementations • NeurIPS 2021 • Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Gautier Izacard, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, Hervé Jégou

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification.

Ranked #1 on Image Classification on Certificate Verification

Data Augmentation Fine-Grained Image Classification +4

29,713

Paper
Code

LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

11 code implementations • ICCV 2021 • Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze

We design a family of image classification architectures that optimize the trade-off between accuracy and efficiency in a high-speed regime.

Ranked #11 on Image Classification on iNaturalist 2019

General Classification Image Classification

124,793

Paper
Code

Training Vision Transformers for Image Retrieval

1 code implementation • 10 Feb 2021 • Alaaeldin El-Nouby, Natalia Neverova, Ivan Laptev, Hervé Jégou

Transformers have shown outstanding results for natural language understanding and, more recently, for image classification.

Image Classification Image Retrieval +3

Paper
Code

Skip-Clip: Self-Supervised Spatiotemporal Representation Learning by Future Clip Order Ranking

no code implementations • 28 Oct 2019 • Alaaeldin El-Nouby, Shuangfei Zhai, Graham W. Taylor, Joshua M. Susskind

Deep neural networks require collecting and annotating large amounts of data to train successfully.

Ranked #44 on Self-Supervised Action Recognition on UCF101

Future prediction Representation Learning +1

Paper
Add Code

Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction

3 code implementations • ICCV 2019 • Alaaeldin El-Nouby, Shikhar Sharma, Hannes Schulz, Devon Hjelm, Layla El Asri, Samira Ebrahimi Kahou, Yoshua Bengio, Graham W. Taylor

Conditional text-to-image generation is an active area of research, with many possible applications.

Ranked #2 on Text-to-Image Generation on GeNeVA (i-CLEVR)

Text-to-Image Generation

Paper
Code

Real-Time End-to-End Action Detection with Two-Stream Networks

no code implementations • 23 Feb 2018 • Alaaeldin El-Nouby, Graham W. Taylor

Finally, for better network initialization, we transfer from the task of action recognition to action detection by pre-training our framework using the recently released large-scale Kinetics dataset.

Action Detection Action Recognition +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.