Search Results for author: Hugo Touvron

Found 16 papers, 13 papers with code

DeiT III: Revenge of the ViT

2 code implementations14 Apr 2022 Hugo Touvron, Matthieu Cord, Hervé Jégou

Our evaluations on Image classification (ImageNet-1k with and without pre-training on ImageNet-21k), transfer learning and semantic segmentation show that our procedure outperforms by a large margin previous fully supervised training recipes for ViT.

 Ranked #1 on Image Classification on ImageNet ReaL (Number of params metric)

Data Augmentation Image Classification +3

Three things everyone should know about Vision Transformers

4 code implementations18 Mar 2022 Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Jakob Verbeek, Hervé Jégou

(2) Fine-tuning the weights of the attention layers is sufficient to adapt vision transformers to a higher resolution and to other classification tasks.

Ranked #6 on Image Classification on CIFAR-10 (using extra training data)

Fine-Grained Image Classification

Are Large-scale Datasets Necessary for Self-Supervised Pre-training?

no code implementations20 Dec 2021 Alaaeldin El-Nouby, Gautier Izacard, Hugo Touvron, Ivan Laptev, Hervé Jegou, Edouard Grave

Our study shows that denoising autoencoders, such as BEiT or a variant that we introduce in this paper, are more robust to the type and size of the pre-training data than popular self-supervised methods trained by comparing image embeddings. We obtain competitive performance compared to ImageNet pre-training on a variety of classification datasets, from different domains.

Denoising Instance Segmentation +1

ResNet strikes back: An improved training procedure in timm

9 code implementations NeurIPS Workshop ImageNet_PPF 2021 Ross Wightman, Hugo Touvron, Hervé Jégou

We share competitive training settings and pre-trained models in the timm open-source library, with the hope that they will serve as better baselines for future work.

Data Augmentation Fine-Grained Image Classification +1

XCiT: Cross-Covariance Image Transformers

10 code implementations NeurIPS 2021 Alaaeldin El-Nouby, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Hervé Jegou

We propose a "transposed" version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries.

Instance Segmentation object-detection +3

Emerging Properties in Self-Supervised Vision Transformers

16 code implementations ICCV 2021 Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin

In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets).

Copy Detection Self-Supervised Image Classification +5

Going deeper with Image Transformers

14 code implementations ICCV 2021 Hugo Touvron, Matthieu Cord, Alexandre Sablayrolles, Gabriel Synnaeve, Hervé Jégou

In particular, we investigate the interplay of architecture and optimization of such dedicated transformers.

Ranked #3 on Image Classification on CIFAR-10 (using extra training data)

Image Classification Transfer Learning

ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases

4 code implementations19 Mar 2021 Stéphane d'Ascoli, Hugo Touvron, Matthew Leavitt, Ari Morcos, Giulio Biroli, Levent Sagun

We initialise the GPSA layers to mimic the locality of convolutional layers, then give each attention head the freedom to escape locality by adjusting a gating parameter regulating the attention paid to position versus content information.

Image Classification Inductive Bias

Powers of layers for image-to-image translation

no code implementations13 Aug 2020 Hugo Touvron, Matthijs Douze, Matthieu Cord, Hervé Jégou

We propose a simple architecture to address unpaired image-to-image translation tasks: style or class transfer, denoising, deblurring, deblocking, etc.

 Ranked #1 on Image-to-Image Translation on horse2zebra (Frechet Inception Distance metric)

Deblurring Denoising +2

Fixing the train-test resolution discrepancy: FixEfficientNet

1 code implementation18 Mar 2020 Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou

An EfficientNet-L2 pre-trained with weak supervision on 300M unlabeled images and further optimized with FixRes achieves 88. 5% top-1 accuracy (top-5: 98. 7%), which establishes the new state of the art for ImageNet with a single crop.

Ranked #8 on Image Classification on ImageNet ReaL (using extra training data)

Data Augmentation Image Classification

Fixing the train-test resolution discrepancy

3 code implementations NeurIPS 2019 Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou

Conversely, when training a ResNeXt-101 32x48d pre-trained in weakly-supervised fashion on 940 million public images at resolution 224x224 and further optimizing for test resolution 320x320, we obtain a test top-1 accuracy of 86. 4% (top-5: 98. 0%) (single-crop).

Ranked #2 on Fine-Grained Image Classification on Birdsnap (using extra training data)

Data Augmentation Fine-Grained Image Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.