TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	ADE20K	Light-Ham (VAN-Large)	Validation mIoU	51.0	# 96
Semantic Segmentation	ADE20K	Light-Ham (VAN-Large)	Params (M)	45.6	# 51
Semantic Segmentation	ADE20K	Light-Ham (VAN-Large)	GFLOPs (512 x 512)	55.0	# 5
Semantic Segmentation	ADE20K	Light-Ham (VAN-Small, D=256)	Validation mIoU	45.2	# 187
Semantic Segmentation	ADE20K	Light-Ham (VAN-Small, D=256)	Params (M)	13.8	# 58
Semantic Segmentation	ADE20K	Light-Ham (VAN-Small, D=256)	GFLOPs (512 x 512)	15.8	# 2
Semantic Segmentation	ADE20K	Light-Ham (VAN-Base)	Validation mIoU	49.6	# 122
Semantic Segmentation	ADE20K	Light-Ham (VAN-Base)	Params (M)	27.4	# 54
Semantic Segmentation	ADE20K	Light-Ham (VAN-Base)	GFLOPs (512 x 512)	34.4	# 4
Semantic Segmentation	ADE20K	Light-Ham (VAN-Huge)	Validation mIoU	51.5	# 88
Semantic Segmentation	ADE20K	Light-Ham (VAN-Huge)	Params (M)	61.1	# 40
Semantic Segmentation	ADE20K	Light-Ham (VAN-Huge)	GFLOPs (512 x 512)	71.8	# 7
Semantic Segmentation	ADE20K	HamNet (ResNet-101)	Validation mIoU	46.8	# 162
Semantic Segmentation	ADE20K val	Light-Ham (VAN-Large, 46M, IN-1k, MS)	mIoU	51.0	# 44
Semantic Segmentation	ADE20K val	Light-Ham (VAN-Base, 27M, IN-1k, MS)	mIoU	49.6	# 53
Semantic Segmentation	ADE20K val	Light-Ham (VAN-Huge, 61M, IN-1k, MS)	mIoU	51.5	# 41
Conditional Image Generation	ImageNet 128x128	HamGAN	FID	14.80	# 18
Conditional Image Generation	ImageNet 128x128	HamGAN	Inception score	58.75	# 14
Semantic Segmentation	PASCAL Context	HamNet (ResNet-101)	mIoU	55.2	# 28
Semantic Segmentation	PASCAL VOC 2012 test	HamNet w/o COCO (ResNet-101)	Mean IoU	85.9%	# 7

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/is-attention-better-than-matrix-decomposition-1/semantic-segmentation-on-pascal-voc-2012)](https://paperswithcode.com/sota/semantic-segmentation-on-pascal-voc-2012?p=is-attention-better-than-matrix-decomposition-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/is-attention-better-than-matrix-decomposition-1/conditional-image-generation-on-imagenet)](https://paperswithcode.com/sota/conditional-image-generation-on-imagenet?p=is-attention-better-than-matrix-decomposition-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/is-attention-better-than-matrix-decomposition-1/semantic-segmentation-on-pascal-context)](https://paperswithcode.com/sota/semantic-segmentation-on-pascal-context?p=is-attention-better-than-matrix-decomposition-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/is-attention-better-than-matrix-decomposition-1/semantic-segmentation-on-ade20k-val)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k-val?p=is-attention-better-than-matrix-decomposition-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/is-attention-better-than-matrix-decomposition-1/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=is-attention-better-than-matrix-decomposition-1)`

Is Attention Better Than Matrix Decomposition?

ICLR 2021 · Zhengyang Geng, Meng-Hao Guo, Hongxu Chen, Xia Li, Ke Wei, Zhouchen Lin ·

As an essential ingredient of modern deep learning, attention mechanism, especially self-attention, plays a vital role in the global correlation discovery. However, is hand-crafted attention irreplaceable when modeling the global context? Our intriguing finding is that self-attention is not better than the matrix decomposition (MD) model developed 20 years ago regarding the performance and computational cost for encoding the long-distance dependencies. We model the global context issue as a low-rank recovery problem and show that its optimization algorithms can help design global information blocks. This paper then proposes a series of Hamburgers, in which we employ the optimization algorithms for solving MDs to factorize the input representations into sub-matrices and reconstruct a low-rank embedding. Hamburgers with different MDs can perform favorably against the popular global context module self-attention when carefully coping with gradients back-propagated through MDs. Comprehensive experiments are conducted in the vision tasks where it is crucial to learn the global context, including semantic segmentation and image generation, demonstrating significant improvements over self-attention and its variants.

PDF Abstract ICLR 2021 PDF ICLR 2021 Abstract

Code

Add Remove Mark official

Gsunshine/Enjoy-Hamburger official

312

plumprc/MTS-Mixers

160

toqitahamid/gasformer

Tasks

Add Remove

Conditional Image Generation

Image Generation

Semantic Segmentation

Datasets

ImageNet

ADE20K

PASCAL Context PASCAL VOC 2012 test

Results from the Paper

Edit

Ranked #7 on Semantic Segmentation on PASCAL VOC 2012 test

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	ADE20K	Light-Ham (VAN-Large)	Validation mIoU	51.0	# 96	Compare
			Params (M)	45.6	# 51	Compare
			GFLOPs (512 x 512)	55.0	# 5	Compare
Semantic Segmentation	ADE20K	Light-Ham (VAN-Small, D=256)	Validation mIoU	45.2	# 187	Compare
			Params (M)	13.8	# 58	Compare
			GFLOPs (512 x 512)	15.8	# 2	Compare
Semantic Segmentation	ADE20K	Light-Ham (VAN-Base)	Validation mIoU	49.6	# 122	Compare
			Params (M)	27.4	# 54	Compare
			GFLOPs (512 x 512)	34.4	# 4	Compare
Semantic Segmentation	ADE20K	Light-Ham (VAN-Huge)	Validation mIoU	51.5	# 88	Compare
			Params (M)	61.1	# 40	Compare
			GFLOPs (512 x 512)	71.8	# 7	Compare
Semantic Segmentation	ADE20K	HamNet (ResNet-101)	Validation mIoU	46.8	# 162	Compare
Semantic Segmentation	ADE20K val	Light-Ham (VAN-Large, 46M, IN-1k, MS)	mIoU	51.0	# 44	Compare
Semantic Segmentation	ADE20K val	Light-Ham (VAN-Base, 27M, IN-1k, MS)	mIoU	49.6	# 53	Compare
Semantic Segmentation	ADE20K val	Light-Ham (VAN-Huge, 61M, IN-1k, MS)	mIoU	51.5	# 41	Compare
Conditional Image Generation	ImageNet 128x128	HamGAN	FID	14.80	# 18	Compare
Conditional Image Generation	ImageNet 128x128	HamGAN	Inception score	58.75	# 14	Compare
Semantic Segmentation	PASCAL Context	HamNet (ResNet-101)	mIoU	55.2	# 28	Compare
Semantic Segmentation	PASCAL VOC 2012 test	HamNet w/o COCO (ResNet-101)	Mean IoU	85.9%	# 7	Compare

Methods

Add Remove

Hamburger

Edit Social Preview

Is Attention Better Than Matrix Decomposition?

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove