TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	ImageNet	gMLP-B	Top 1 Accuracy	81.6%	# 569
Image Classification	ImageNet	gMLP-B	Number of params	73M	# 793
Image Classification	ImageNet	gMLP-B	GFLOPs	31.6	# 395
Natural Language Inference	MultiNLI	gMLP-large	Matched	86.2	# 25
Natural Language Inference	MultiNLI	gMLP-large	Mismatched	86.5	# 13
Question Answering	SQuAD2.0	gMLP-large	F1	78.3	# 226
Sentiment Analysis	SST-2 Binary classification	gMLP-large	Accuracy	94.8	# 28

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pay-attention-to-mlps/natural-language-inference-on-multinli)](https://paperswithcode.com/sota/natural-language-inference-on-multinli?p=pay-attention-to-mlps)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pay-attention-to-mlps/sentiment-analysis-on-sst-2-binary)](https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary?p=pay-attention-to-mlps)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pay-attention-to-mlps/question-answering-on-squad20)](https://paperswithcode.com/sota/question-answering-on-squad20?p=pay-attention-to-mlps)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pay-attention-to-mlps/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=pay-attention-to-mlps)`

Pay Attention to MLPs

NeurIPS 2021 · Hanxiao Liu, Zihang Dai, David R. So, Quoc V. Le ·

Transformers have become one of the most important architectural innovations in deep learning and have enabled many breakthroughs over the past few years. Here we propose a simple network architecture, gMLP, based on MLPs with gating, and show that it can perform as well as Transformers in key language and vision applications. Our comparisons show that self-attention is not critical for Vision Transformers, as gMLP can achieve the same accuracy. For BERT, our model achieves parity with Transformers on pretraining perplexity and is better on some downstream NLP tasks. On finetuning tasks where gMLP performs worse, making the gMLP model substantially larger can close the gap with Transformers. In general, our experiments show that gMLP can scale as well as Transformers over increased data and compute.

PDF Abstract NeurIPS 2021 PDF NeurIPS 2021 Abstract

Code

Add Remove Mark official

labmlai/annotated_deep_learning_pap…

↳ View annotated code at

labml.ai

47,992

rwightman/pytorch-image-models

29,758

xmu-xiaoma666/External-Attention-py…

10,842

BR-IDL/PaddleViT

1,185

nlpodyssey/spago

1,133

See all 20 implementations

Tasks

Add Remove

Image Classification

Natural Language Inference

Question Answering

Sentiment Analysis

Datasets

ImageNet

GLUE

SST

SQuAD

MultiNLI SST-2

Results from the Paper

Edit

Ranked #25 on Natural Language Inference on MultiNLI

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	ImageNet	gMLP-B	Top 1 Accuracy	81.6%	# 569	Compare
			Number of params	73M	# 793	Compare
			GFLOPs	31.6	# 395	Compare
Natural Language Inference	MultiNLI	gMLP-large	Matched	86.2	# 25	Compare
Natural Language Inference	MultiNLI	gMLP-large	Mismatched	86.5	# 13	Compare
Question Answering	SQuAD2.0	gMLP-large	F1	78.3	# 226	Compare
Sentiment Analysis	SST-2 Binary classification	gMLP-large	Accuracy	94.8	# 28	Compare

Methods

Add Remove

Adam • Attention Dropout • BERT • Dense Connections • Dropout • GELU • gMLP • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Spatial Gating Unit • Weight Decay • WordPiece

Edit Social Preview

Pay Attention to MLPs

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove