Search Results for author: Alaaeldin El-Nouby

Found 16 papers, 10 papers with code

ImageBind: One Embedding Space To Bind Them All

1 code implementation CVPR 2023 Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra

We show that all combinations of paired data are not necessary to train such a joint embedding, and only image-paired data is sufficient to bind the modalities together.

Ranked #2 on Zero-shot Audio Classification on AudioSet (using extra training data)

Cross-Modal Retrieval Retrieval +7

Are Visual Recognition Models Robust to Image Compression?

no code implementations10 Apr 2023 João Maria Janeiro, Stanislav Frolov, Alaaeldin El-Nouby, Jakob Verbeek

For example, for segmentation mIoU is reduced from 44. 5 to 30. 5 mIoU when compressing to 0. 1 bpp using the best compression model we evaluated.

Image Classification Image Compression +4

OmniMAE: Single Model Masked Pretraining on Images and Videos

1 code implementation CVPR 2023 Rohit Girdhar, Alaaeldin El-Nouby, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra

Furthermore, this model can be learned by dropping 90% of the image and 95% of the video patches, enabling extremely fast training of huge model architectures.

Three things everyone should know about Vision Transformers

4 code implementations18 Mar 2022 Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Jakob Verbeek, Hervé Jégou

(2) Fine-tuning the weights of the attention layers is sufficient to adapt vision transformers to a higher resolution and to other classification tasks.

Fine-Grained Image Classification

Are Large-scale Datasets Necessary for Self-Supervised Pre-training?

no code implementations20 Dec 2021 Alaaeldin El-Nouby, Gautier Izacard, Hugo Touvron, Ivan Laptev, Hervé Jegou, Edouard Grave

Our study shows that denoising autoencoders, such as BEiT or a variant that we introduce in this paper, are more robust to the type and size of the pre-training data than popular self-supervised methods trained by comparing image embeddings. We obtain competitive performance compared to ImageNet pre-training on a variety of classification datasets, from different domains.

Denoising Instance Segmentation +1

XCiT: Cross-Covariance Image Transformers

11 code implementations NeurIPS 2021 Alaaeldin El-Nouby, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Hervé Jegou

We propose a "transposed" version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries.

Instance Segmentation object-detection +3

Training Vision Transformers for Image Retrieval

1 code implementation10 Feb 2021 Alaaeldin El-Nouby, Natalia Neverova, Ivan Laptev, Hervé Jégou

Transformers have shown outstanding results for natural language understanding and, more recently, for image classification.

Image Classification Image Retrieval +3

Real-Time End-to-End Action Detection with Two-Stream Networks

no code implementations23 Feb 2018 Alaaeldin El-Nouby, Graham W. Taylor

Finally, for better network initialization, we transfer from the task of action recognition to action detection by pre-training our framework using the recently released large-scale Kinetics dataset.

Action Detection Action Recognition +3

Cannot find the paper you are looking for? You can Submit a new open access paper.