ResMLP: Feedforward networks for image classification with data-efficient training

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We also train ResMLP models in a self-supervised setup, to further remove priors from employing a labelled dataset. Finally, by adapting our model to machine translation we achieve surprisingly good results. We share pre-trained models and our code based on the Timm library.

PDF Abstract NeurIPS 2021 PDF NeurIPS 2021 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Image Classification Certificate Verification ResMLP-24 Percentage correct 98.7 # 1
Top-1 Accuracy 98.7 # 1
Image Classification Certificate Verification ResMLP-12 Percentage correct 98.1 # 3
Top-1 Accuracy 98.1 # 3
Image Classification CIFAR-100 ResMLP-12 Percentage correct 87.0 # 48
Image Classification CIFAR-100 ResMLP-24 Percentage correct 89.5 # 30
Image Classification Flowers-102 ResMLP24 Accuracy 97.9 # 29
Image Classification Flowers-102 ResMLP12 Accuracy 97.4 # 37
Self-Supervised Image Classification ImageNet DINO (ResMLP-12) Top 1 Accuracy 67.5% # 102
Number of Params 15M # 81
Self-Supervised Image Classification ImageNet DINO (ResMLP-24) Top 1 Accuracy 72.8% # 88
Number of Params 30M # 46
Image Classification ImageNet ResMLP-36 Top 1 Accuracy 79.7% # 685
Number of params 45M # 707
Image Classification ImageNet ResMLP-24 Top 1 Accuracy 79.4% # 695
Image Classification ImageNet ResMLP-S12 Top 1 Accuracy 77.8% # 795
Number of params 15.4M # 516
Image Classification ImageNet ResMLP-12 (distilled, class-MLP) Top 1 Accuracy 78.6% # 753
Number of params 17.7M # 524
GFLOPs 3 # 174
Image Classification ImageNet ResMLP-S24 Top 1 Accuracy 80.8% # 623
Number of params 30M # 646
GFLOPs 6 # 242
Image Classification ImageNet ResMLP-B24/8 Top 1 Accuracy 83.6% # 378
Number of params 116M # 875
Image Classification ImageNet ReaL ResMLP-36 Accuracy 85.6% # 40
Params 45M # 42
Image Classification ImageNet ReaL ResMLP-B24/8 (22k) Top 1 Accuracy 84.4% # 5
Image Classification ImageNet ReaL ResMLP-12 Accuracy 84.6% # 44
Params 15M # 38
Image Classification ImageNet ReaL ResMLP-24 Accuracy 85.3% # 42
Params 30M # 41
Image Classification ImageNet V2 ResMLP-S24/16 Top 1 Accuracy 69.8 # 25
Image Classification ImageNet V2 ResMLP-B24/8 Top 1 Accuracy 73.4 # 20
Image Classification ImageNet V2 ResMLP-S12/16 Top 1 Accuracy 66.0 # 31
Image Classification ImageNet V2 ResMLP-B24/8 22k Top 1 Accuracy 74.2 # 18
Image Classification iNaturalist 2018 ResMLP-24 Top-1 Accuracy 64.3 # 45
Image Classification iNaturalist 2018 ResMLP-12 Top-1 Accuracy 60.2 # 49
Image Classification iNaturalist 2019 ResMLP-12 Top-1 Accuracy 71.0 # 15
Image Classification iNaturalist 2019 ResMLP-24 Top-1 Accuracy 72.5 # 13
Fine-Grained Image Classification Oxford 102 Flowers ResMLP-12 Accuracy 97.4% # 20
Fine-Grained Image Classification Oxford 102 Flowers ResMLP-24 Accuracy 97.9% # 16
Fine-Grained Image Classification Stanford Cars ResMLP-24 Accuracy 89.5% # 70
Image Classification Stanford Cars ResMLP-24 Accuracy 89.5 # 15
Image Classification Stanford Cars ResMLP-12 Accuracy 84.6 # 20
Fine-Grained Image Classification Stanford Cars ResMLP-12 Accuracy 84.6% # 72
Machine Translation WMT2014 English-French ResMLP-12 BLEU score 40.6 # 28
Machine Translation WMT2014 English-French ResMLP-6 BLEU score 40.3 # 33
Machine Translation WMT2014 English-German ResMLP-6 BLEU score 26.4 # 59
Machine Translation WMT2014 English-German ResMLP-12 BLEU score 26.8 # 56

Methods