TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	ImageNet	sMLPNet-B (ImageNet-1k)	Top 1 Accuracy	83.4%	# 394
Image Classification	ImageNet	sMLPNet-B (ImageNet-1k)	Number of params	65.9M	# 776
Image Classification	ImageNet	sMLPNet-S (ImageNet-1k)	Top 1 Accuracy	83.1%	# 426
Image Classification	ImageNet	sMLPNet-S (ImageNet-1k)	Number of params	48.6M	# 717
Image Classification	ImageNet	sMLPNet-T (ImageNet-1k)	Top 1 Accuracy	81.9%	# 543
Image Classification	ImageNet	sMLPNet-T (ImageNet-1k)	Number of params	24.1M	# 581

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sparse-mlp-for-image-recognition-is-self/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=sparse-mlp-for-image-recognition-is-self)`

Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?

12 Sep 2021 · Chuanxin Tang, Yucheng Zhao, Guangting Wang, Chong Luo, Wenxuan Xie, Wenjun Zeng ·

Transformers have sprung up in the field of computer vision. In this work, we explore whether the core self-attention module in Transformer is the key to achieving excellent performance in image recognition. To this end, we build an attention-free network called sMLPNet based on the existing MLP-based vision models. Specifically, we replace the MLP module in the token-mixing step with a novel sparse MLP (sMLP) module. For 2D image tokens, sMLP applies 1D MLP along the axial directions and the parameters are shared among rows or columns. By sparse connection and weight sharing, sMLP module significantly reduces the number of model parameters and computational complexity, avoiding the common over-fitting problem that plagues the performance of MLP-like models. When only trained on the ImageNet-1K dataset, the proposed sMLPNet achieves 81.9% top-1 accuracy with only 24M parameters, which is much better than most CNNs and vision Transformers under the same model size constraint. When scaling up to 66M parameters, sMLPNet achieves 83.4% top-1 accuracy, which is on par with the state-of-the-art Swin Transformer. The success of sMLPNet suggests that the self-attention mechanism is not necessarily a silver bullet in computer vision. The code and models are publicly available at https://github.com/microsoft/SPACH

PDF Abstract

Code

Add Remove Mark official

microsoft/SPACH official

191

liuruiyang98/Jittor-MLP

160

Tasks

Add Remove

Image Classification

Datasets

ImageNet

Results from the Paper

Edit

Ranked #391 on Image Classification on ImageNet

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	ImageNet	sMLPNet-B (ImageNet-1k)	Top 1 Accuracy	83.4%	# 394	Compare
Image Classification	ImageNet	sMLPNet-B (ImageNet-1k)	Number of params	65.9M	# 776	Compare
Image Classification	ImageNet	sMLPNet-S (ImageNet-1k)	Top 1 Accuracy	83.1%	# 426	Compare
Image Classification	ImageNet	sMLPNet-S (ImageNet-1k)	Number of params	48.6M	# 717	Compare
Image Classification	ImageNet	sMLPNet-T (ImageNet-1k)	Top 1 Accuracy	81.9%	# 543	Compare
Image Classification	ImageNet	sMLPNet-T (ImageNet-1k)	Number of params	24.1M	# 581	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • Average Pooling • BPE • Dense Connections • Dropout • GELU • Global Average Pooling • Label Smoothing • Layer Normalization • Linear Layer • MLP-Mixer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Stochastic Depth • Swin Transformer • Transformer

Edit Social Preview

Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove