TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	ADE20K	Twins-SVT-L (UperNet, ImageNet-1k pretrain)	Validation mIoU	50.2	# 110
Semantic Segmentation	ADE20K val	Twins-SVT-L (UperNet, ImageNet-1k pretrain)	mIoU	50.2	# 48
Image Classification	ImageNet	Twins-SVT-L	Top 1 Accuracy	83.7%	# 365
Image Classification	ImageNet	Twins-SVT-L	Number of params	99.2M	# 865
Image Classification	ImageNet	Twins-SVT-L	GFLOPs	15.1	# 338

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/twins-revisiting-spatial-attention-design-in/semantic-segmentation-on-ade20k-val)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k-val?p=twins-revisiting-spatial-attention-design-in)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/twins-revisiting-spatial-attention-design-in/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=twins-revisiting-spatial-attention-design-in)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/twins-revisiting-spatial-attention-design-in/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=twins-revisiting-spatial-attention-design-in)`

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

NeurIPS 2021 · Xiangxiang Chu, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, Huaxia Xia, Chunhua Shen ·

Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks. In this work, we revisit the design of the spatial attention and demonstrate that a carefully-devised yet simple spatial attention mechanism performs favourably against the state-of-the-art schemes. As a result, we propose two vision transformer architectures, namely, Twins-PCPVT and Twins-SVT. Our proposed architectures are highly-efficient and easy to implement, only involving matrix multiplications that are highly optimized in modern deep learning frameworks. More importantly, the proposed architectures achieve excellent performance on a wide range of visual tasks, including image level classification as well as dense detection and segmentation. The simplicity and strong performance suggest that our proposed architectures may serve as stronger backbones for many vision tasks. Our code is released at https://github.com/Meituan-AutoML/Twins .

PDF Abstract NeurIPS 2021 PDF NeurIPS 2021 Abstract

Code

Add Remove Mark official

Meituan-AutoML/Twins official

558

rwightman/pytorch-image-models

29,758

PaddlePaddle/PaddleClas

5,254

open-mmlab/mmclassification

3,156

ttt496/vit-pytorch

See all 8 implementations

Tasks

Add Remove

Image Classification

Semantic Segmentation

Datasets

ImageNet

MS COCO

ADE20K ImageNet-1K

Results from the Paper

Edit

Ranked #48 on Semantic Segmentation on ADE20K val

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	ADE20K	Twins-SVT-L (UperNet, ImageNet-1k pretrain)	Validation mIoU	50.2	# 110	Compare
Semantic Segmentation	ADE20K val	Twins-SVT-L (UperNet, ImageNet-1k pretrain)	mIoU	50.2	# 48	Compare
Image Classification	ImageNet	Twins-SVT-L	Top 1 Accuracy	83.7%	# 365	Compare
			Number of params	99.2M	# 865	Compare
			GFLOPs	15.1	# 338	Compare

Methods

Add Remove

Conditional Positional Encoding • Dense Connections • Depthwise Convolution • Global Sub-Sampled Attention • Layer Normalization • Linear Layer • Locally-Grouped Self-Attention • Multi-Head Attention • Positional Encoding Generator • Residual Connection • Scaled Dot-Product Attention • Softmax • Spatially Separable Self-Attention • Twins-PCPVT • Twins-SVT • Vision Transformer

Edit Social Preview

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove