ResMLP: Feedforward networks for image classification with data-efficient training

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch... When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We also train ResMLP models in a self-supervised setup, to further remove priors from employing a labelled dataset. Finally, by adapting our model to machine translation we achieve surprisingly good results. We share pre-trained models and our code based on the Timm library. read more

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Classification ImageNet ResMLP-12 Top 1 Accuracy 77.8% # 299
Number of params 15M # 184
Image Classification ImageNet ResMLP-24 Top 1 Accuracy 79.4% # 238
Number of params 30M # 140
Image Classification ImageNet ResMLP-36 Top 1 Accuracy 79.7% # 235
Number of params 45M # 112
Image Classification ImageNet ReaL ResMLP-36 Accuracy 85.6% # 28
Params 45M # 30
Image Classification ImageNet ReaL ResMLP-12 Accuracy 84.6% # 33
Params 15M # 27
Image Classification ImageNet ReaL ResMLP-24 Accuracy 85.3% # 30
Params 30M # 29