LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

We design a family of image classification architectures that optimize the trade-off between accuracy and efficiency in a high-speed regime. Our work exploits recent findings in attention-based architectures, which are competitive on highly parallel processing hardware... We revisit principles from the extensive literature on convolutional neural networks to apply them to transformers, in particular activation maps with decreasing resolutions. We also introduce the attention bias, a new way to integrate positional information in vision transformers. As a result, we propose LeVIT: a hybrid neural network for fast inference image classification. We consider different measures of efficiency on different hardware platforms, so as to best reflect a wide range of application scenarios. Our extensive experiments empirically validate our technical choices and show they are suitable to most architectures. Overall, LeViT significantly outperforms existing convnets and vision transformers with respect to the speed/accuracy tradeoff. For example, at 80% ImageNet top-1 accuracy, LeViT is 5 times faster than EfficientNet on CPU. We release the code at https://github.com/facebookresearch/LeViT read more

PDF Abstract

Results from the Paper


Ranked #3 on Image Classification on ImageNet V2 (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Image Classification CIFAR-10 LeViT-128S Percentage correct 97.5 # 54
Image Classification CIFAR-10 LeViT-256 Percentage correct 98.1 # 38
Image Classification CIFAR-10 LeViT-192 Percentage correct 98.2 # 34
Image Classification CIFAR-10 LeViT-128 Percentage correct 97.6 # 52
Image Classification CIFAR-10 LeViT-384 Percentage correct 98 # 41
Image Classification CIFAR-100 LeViT-128 Percentage correct 85 # 51
Image Classification CIFAR-100 LeViT-384 Percentage correct 86.2 # 41
Image Classification CIFAR-100 LeViT-128S Percentage correct 80.3 # 80
Image Classification CIFAR-100 LeViT-256 Percentage correct 86 # 44
Image Classification CIFAR-100 LeViT-192 Percentage correct 85.5 # 48
Image Classification Flowers-102 LeViT-384 Accuracy 98.3 # 19
Image Classification Flowers-102 LeViT-256 Accuracy 97.7 # 28
Image Classification Flowers-102 LeViT-128S Accuracy 96.8 # 31
Image Classification Flowers-102 LeViT-192 Accuracy 97.8 # 26
Image Classification ImageNet LeViT-384 Top 1 Accuracy 82.5% # 147
Image Classification ImageNet LeViT-192 Top 1 Accuracy 80% # 221
Image Classification ImageNet LeViT-128 Top 1 Accuracy 79.6% # 236
Image Classification ImageNet LeViT-128S Top 1 Accuracy 75.7% # 341
Image Classification ImageNet LeViT-256 Top 1 Accuracy 81.6% # 179
Image Classification ImageNet ReaL LeViT-256 Accuracy 86.9% # 24
Image Classification ImageNet ReaL LeViT-128S Accuracy 82.6% # 37
Image Classification ImageNet ReaL LeViT-192 Accuracy 85.8% # 27
Image Classification ImageNet ReaL LeViT-128 Accuracy 85.6% # 28
Image Classification ImageNet ReaL LeViT-384 Accuracy 87.5% # 22
Image Classification ImageNet V2 LeViT-384 Top 1 Accuracy 71.4 # 3
Image Classification ImageNet V2 LeViT-256 Top 1 Accuracy 69.9 # 4
Image Classification ImageNet V2 LeViT-192 Top 1 Accuracy 68.7 # 6
Image Classification ImageNet V2 LeViT-128 Top 1 Accuracy 67.5 # 7
Image Classification ImageNet V2 LeViT-128S Top 1 Accuracy 63.9 # 10
Image Classification iNaturalist 2018 LeViT-128S Top-1 Accuracy 55.2% # 21
Image Classification iNaturalist 2018 LeViT-256 Top-1 Accuracy 66.2% # 15
Image Classification iNaturalist 2018 LeViT-384 Top-1 Accuracy 66.9% # 14
Image Classification iNaturalist 2018 LeViT-128 Top-1 Accuracy 54% # 22
Image Classification iNaturalist 2018 LeViT-192 Top-1 Accuracy 60.4% # 19
Image Classification iNaturalist 2019 LeViT-128S Top-1 Accuracy 66.5 # 11
Image Classification iNaturalist 2019 LeViT-128 Top-1 Accuracy 68.4 # 10
Image Classification iNaturalist 2019 LeViT-256 Top-1 Accuracy 72.3 # 8
Image Classification iNaturalist 2019 LeViT-384 Top-1 Accuracy 74.3 # 6
Image Classification iNaturalist 2019 LeViT-192 Top-1 Accuracy 70.8 # 9
Image Classification Stanford Cars LeViT-384 Accuracy 89.3 # 12
Image Classification Stanford Cars LeViT-128S Accuracy 88.4 # 14
Image Classification Stanford Cars LeViT-128 Accuracy 88.6 # 13
Image Classification Stanford Cars LeViT-256 Accuracy 88.2 # 15
Image Classification Stanford Cars LeViT-192 Accuracy 89.8 # 11

Methods