TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	ImageNet	Mixer-B/16	Top 1 Accuracy	76.44%	# 843
Image Classification	ImageNet	Mixer-B/16	Number of params	46M	# 709
Image Classification	ImageNet	Mixer-H/14 (JFT-300M pre-train)	Top 1 Accuracy	87.94%	# 73
Image Classification	ImageNet	Mixer-H/14 (JFT-300M pre-train)	Hardware Burden	None	# 1
Image Classification	ImageNet	Mixer-H/14 (JFT-300M pre-train)	Operations per network pass	None	# 1
Image Classification	ImageNet	ViT-L/16 Dosovitskiy et al. (2021)	Top 1 Accuracy	85.3%	# 231
Image Classification	ImageNet ReaL	Mixer-H/14- 448 (JFT-300M pre-train)	Accuracy	90.18%	# 20
Image Classification	ImageNet ReaL	Mixer-H/14- 448 (JFT-300M pre-train)	Params	409M	# 47
Image Classification	ImageNet ReaL	Mixer-H/14 (JFT-300M pre-train)	Accuracy	87.86%	# 30
Image Classification	ImageNet ReaL	Mixer-H/14 (JFT-300M pre-train)	Params	409M	# 47
Image Classification	OmniBenchmark	MLP-Mixer	Average Top-1 Accuracy	32.2	# 17

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mlp-mixer-an-all-mlp-architecture-for-vision/image-classification-on-omnibenchmark)](https://paperswithcode.com/sota/image-classification-on-omnibenchmark?p=mlp-mixer-an-all-mlp-architecture-for-vision)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mlp-mixer-an-all-mlp-architecture-for-vision/image-classification-on-imagenet-real)](https://paperswithcode.com/sota/image-classification-on-imagenet-real?p=mlp-mixer-an-all-mlp-architecture-for-vision)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mlp-mixer-an-all-mlp-architecture-for-vision/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=mlp-mixer-an-all-mlp-architecture-for-vision)`

MLP-Mixer: An all-MLP Architecture for Vision

NeurIPS 2021 · Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy ·

Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper we show that while convolutions and attention are both sufficient for good performance, neither of them are necessary. We present MLP-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). MLP-Mixer contains two types of layers: one with MLPs applied independently to image patches (i.e. "mixing" the per-location features), and one with MLPs applied across patches (i.e. "mixing" spatial information). When trained on large datasets, or with modern regularization schemes, MLP-Mixer attains competitive scores on image classification benchmarks, with pre-training and inference cost comparable to state-of-the-art models. We hope that these results spark further research beyond the realms of well established CNNs and Transformers.

PDF Abstract NeurIPS 2021 PDF NeurIPS 2021 Abstract

Code

Add Remove Mark official

google-research/vision_transformer official

↳ Quickstart in

Colab

9,280

labmlai/annotated_deep_learning_pap…

↳ View annotated code at

labml.ai

48,096

rwightman/pytorch-image-models

29,774

xmu-xiaoma666/External-Attention-py…

10,853

open-mmlab/mmclassification

3,157

See all 46 implementations

Tasks

Add Remove

Image Classification

Datasets

CIFAR-10

ImageNet

JFT-300M JFT-3B

OmniBenchmark

Results from the Paper

Edit

Ranked #17 on Image Classification on OmniBenchmark

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	ImageNet	Mixer-B/16	Top 1 Accuracy	76.44%	# 843	Compare
Image Classification	ImageNet	Mixer-B/16	Number of params	46M	# 709	Compare
Image Classification	ImageNet	Mixer-H/14 (JFT-300M pre-train)	Top 1 Accuracy	87.94%	# 73	Compare
			Hardware Burden	None	# 1	Compare
			Operations per network pass	None	# 1	Compare
Image Classification	ImageNet	ViT-L/16 Dosovitskiy et al. (2021)	Top 1 Accuracy	85.3%	# 231	Compare
Image Classification	ImageNet ReaL	Mixer-H/14- 448 (JFT-300M pre-train)	Accuracy	90.18%	# 20	Compare
Image Classification	ImageNet ReaL	Mixer-H/14- 448 (JFT-300M pre-train)	Params	409M	# 47	Compare
Image Classification	ImageNet ReaL	Mixer-H/14 (JFT-300M pre-train)	Accuracy	87.86%	# 30	Compare
Image Classification	ImageNet ReaL	Mixer-H/14 (JFT-300M pre-train)	Params	409M	# 47	Compare
Image Classification	OmniBenchmark	MLP-Mixer	Average Top-1 Accuracy	32.2	# 17	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • Average Pooling • BPE • Dense Connections • Dropout • GELU • Global Average Pooling • Label Smoothing • Layer Normalization • Linear Layer • Mixer Layer • MLP-Mixer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

MLP-Mixer: An all-MLP Architecture for Vision

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove