LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

We design a family of image classification architectures that optimize the trade-off between accuracy and efficiency in a high-speed regime. Our work exploits recent findings in attention-based architectures, which are competitive on highly parallel processing hardware. We revisit principles from the extensive literature on convolutional neural networks to apply them to transformers, in particular activation maps with decreasing resolutions. We also introduce the attention bias, a new way to integrate positional information in vision transformers. As a result, we propose LeVIT: a hybrid neural network for fast inference image classification. We consider different measures of efficiency on different hardware platforms, so as to best reflect a wide range of application scenarios. Our extensive experiments empirically validate our technical choices and show they are suitable to most architectures. Overall, LeViT significantly outperforms existing convnets and vision transformers with respect to the speed/accuracy tradeoff. For example, at 80% ImageNet top-1 accuracy, LeViT is 5 times faster than EfficientNet on CPU. We release the code at https://github.com/facebookresearch/LeViT

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Image Classification CIFAR-10 LeViT-256 Percentage correct 98.1 # 49
Top-1 Accuracy 98.1 # 14
Image Classification CIFAR-10 LeViT-128 Percentage correct 97.6 # 71
Top-1 Accuracy 97.6 # 17
Image Classification CIFAR-10 LeViT-384 Percentage correct 98 # 52
Top-1 Accuracy 98 # 16
Image Classification CIFAR-10 LeViT-192 Percentage correct 98.2 # 44
Top-1 Accuracy 98.2 # 13
Image Classification CIFAR-10 LeViT-128S Percentage correct 97.5 # 73
Top-1 Accuracy 97.5 # 18
Image Classification Flowers-102 LeViT-384 Accuracy 98.3 # 24
Image Classification Flowers-102 LeViT-192 Accuracy 97.8 # 34
Image Classification Flowers-102 LeViT-256 Accuracy 97.7 # 36
Image Classification Flowers-102 LeViT-128S Accuracy 96.8 # 40
Image Classification ImageNet LeViT-128S Top 1 Accuracy 75.7% # 860
Number of params 4.7M # 387
GFLOPs 0.288 # 27
Image Classification ImageNet LeViT-256 Top 1 Accuracy 81.6% # 562
Number of params 17.8M # 520
GFLOPs 1.066 # 108
Image Classification ImageNet LeViT-128 Top 1 Accuracy 79.6% # 681
Number of params 8.8M # 460
GFLOPs 0.376 # 41
Image Classification ImageNet LeViT-192 Top 1 Accuracy 80% # 657
Number of params 10.4M # 474
GFLOPs 0.624 # 73
Image Classification ImageNet LeViT-384 Top 1 Accuracy 82.5% # 476
Number of params 39.4M # 667
GFLOPs 2.334 # 158
Image Classification ImageNet ReaL LeViT-384 Accuracy 87.5% # 33
Image Classification ImageNet ReaL LeViT-128S Accuracy 82.6% # 48
Image Classification ImageNet ReaL LeViT-128 Accuracy 85.6% # 40
Image Classification ImageNet ReaL LeViT-192 Accuracy 85.8% # 38
Image Classification ImageNet ReaL LeViT-256 Accuracy 86.9% # 35
Image Classification ImageNet V2 LeViT-256 Top 1 Accuracy 69.9 # 24
Image Classification ImageNet V2 LeViT-192 Top 1 Accuracy 68.7 # 27
Image Classification ImageNet V2 LeViT-384 Top 1 Accuracy 71.4 # 23
Image Classification ImageNet V2 LeViT-128 Top 1 Accuracy 67.5 # 29
Image Classification ImageNet V2 LeViT-128S Top 1 Accuracy 63.9 # 33
Image Classification iNaturalist 2018 LeViT-192 Top-1 Accuracy 60.4% # 48
Image Classification iNaturalist 2018 LeViT-256 Top-1 Accuracy 66.2% # 41
Image Classification iNaturalist 2018 LeViT-128S Top-1 Accuracy 55.2% # 51
Image Classification iNaturalist 2018 LeViT-384 Top-1 Accuracy 66.9% # 40
Image Classification iNaturalist 2018 LeViT-128 Top-1 Accuracy 54% # 52
Image Classification iNaturalist 2019 LeViT-128 Top-1 Accuracy 68.4 # 17
Image Classification iNaturalist 2019 LeViT-192 Top-1 Accuracy 70.8 # 16
Image Classification iNaturalist 2019 LeViT-256 Top-1 Accuracy 72.3 # 14
Image Classification iNaturalist 2019 LeViT-128S Top-1 Accuracy 66.5 # 18
Image Classification iNaturalist 2019 LeViT-384 Top-1 Accuracy 74.3 # 11
Image Classification Stanford Cars LeViT-256 Accuracy 88.2 # 19
Image Classification Stanford Cars LeViT-128S Accuracy 88.4 # 18
Image Classification Stanford Cars LeViT-384 Accuracy 89.3 # 16
Image Classification Stanford Cars LeViT-128 Accuracy 88.6 # 17
Image Classification Stanford Cars LeViT-192 Accuracy 89.8 # 14

Methods