ResMLP: Feedforward networks for image classification with data-efficient training

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We also train ResMLP models in a self-supervised setup, to further remove priors from employing a labelled dataset. Finally, by adapting our model to machine translation we achieve surprisingly good results. We share pre-trained models and our code based on the Timm library.

PDF Abstract NeurIPS 2021 PDF NeurIPS 2021 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Image Classification Certificate Verification ResMLP-24 Percentage correct 98.7 # 1
Top-1 Accuracy 98.7 # 1
Image Classification Certificate Verification ResMLP-12 Percentage correct 98.1 # 3
Top-1 Accuracy 98.1 # 3
Image Classification CIFAR-100 ResMLP-12 Percentage correct 87.0 # 48
Image Classification CIFAR-100 ResMLP-24 Percentage correct 89.5 # 30
Image Classification Flowers-102 ResMLP24 Accuracy 97.9 # 29
Image Classification Flowers-102 ResMLP12 Accuracy 97.4 # 37
Self-Supervised Image Classification ImageNet DINO (ResMLP-12) Top 1 Accuracy 67.5% # 100
Number of Params 15M # 80
Self-Supervised Image Classification ImageNet DINO (ResMLP-24) Top 1 Accuracy 72.8% # 86
Number of Params 30M # 46
Image Classification ImageNet ResMLP-36 Top 1 Accuracy 79.7% # 678
Number of params 45M # 701
Image Classification ImageNet ResMLP-24 Top 1 Accuracy 79.4% # 688
Image Classification ImageNet ResMLP-S12 Top 1 Accuracy 77.8% # 788
Number of params 15.4M # 511
Image Classification ImageNet ResMLP-12 (distilled, class-MLP) Top 1 Accuracy 78.6% # 746
Number of params 17.7M # 519
GFLOPs 3 # 174
Image Classification ImageNet ResMLP-S24 Top 1 Accuracy 80.8% # 616
Number of params 30M # 640
GFLOPs 6 # 242
Image Classification ImageNet ResMLP-B24/8 Top 1 Accuracy 83.6% # 373
Number of params 116M # 868
Image Classification ImageNet ReaL ResMLP-36 Accuracy 85.6% # 40
Params 45M # 42
Image Classification ImageNet ReaL ResMLP-B24/8 (22k) Top 1 Accuracy 84.4% # 5
Image Classification ImageNet ReaL ResMLP-12 Accuracy 84.6% # 44
Params 15M # 38
Image Classification ImageNet ReaL ResMLP-24 Accuracy 85.3% # 42
Params 30M # 41
Image Classification ImageNet V2 ResMLP-S24/16 Top 1 Accuracy 69.8 # 25
Image Classification ImageNet V2 ResMLP-B24/8 Top 1 Accuracy 73.4 # 20
Image Classification ImageNet V2 ResMLP-S12/16 Top 1 Accuracy 66.0 # 31
Image Classification ImageNet V2 ResMLP-B24/8 22k Top 1 Accuracy 74.2 # 18
Image Classification iNaturalist 2018 ResMLP-24 Top-1 Accuracy 64.3 # 45
Image Classification iNaturalist 2018 ResMLP-12 Top-1 Accuracy 60.2 # 49
Image Classification iNaturalist 2019 ResMLP-12 Top-1 Accuracy 71.0 # 15
Image Classification iNaturalist 2019 ResMLP-24 Top-1 Accuracy 72.5 # 13
Fine-Grained Image Classification Oxford 102 Flowers ResMLP-12 Accuracy 97.4% # 20
Fine-Grained Image Classification Oxford 102 Flowers ResMLP-24 Accuracy 97.9% # 16
Fine-Grained Image Classification Stanford Cars ResMLP-24 Accuracy 89.5% # 69
Image Classification Stanford Cars ResMLP-24 Accuracy 89.5 # 15
Image Classification Stanford Cars ResMLP-12 Accuracy 84.6 # 20
Fine-Grained Image Classification Stanford Cars ResMLP-12 Accuracy 84.6% # 71
Machine Translation WMT2014 English-French ResMLP-12 BLEU score 40.6 # 28
Machine Translation WMT2014 English-French ResMLP-6 BLEU score 40.3 # 33
Machine Translation WMT2014 English-German ResMLP-6 BLEU score 26.4 # 59
Machine Translation WMT2014 English-German ResMLP-12 BLEU score 26.8 # 56

Methods