TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	ADE20K	UperNet Shuffle-T	Validation mIoU	47.6	# 152
Semantic Segmentation	ADE20K	UperNet Shuffle-B	Validation mIoU	50.5	# 104
Semantic Segmentation	ADE20K val	UperNet Shuffle-B	mIoU	50.5	# 45
Semantic Segmentation	ADE20K val	UperNet Shuffle-S	mIoU	49.6	# 53
Semantic Segmentation	ADE20K val	UperNet Shuffle-T	mIoU	47.6	# 61

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/shuffle-transformer-rethinking-spatial/semantic-segmentation-on-ade20k-val)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k-val?p=shuffle-transformer-rethinking-spatial)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/shuffle-transformer-rethinking-spatial/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=shuffle-transformer-rethinking-spatial)`

Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer

7 Jun 2021 · Zilong Huang, Youcheng Ben, Guozhong Luo, Pei Cheng, Gang Yu, Bin Fu ·

Very recently, Window-based Transformers, which computed self-attention within non-overlapping local windows, demonstrated promising results on image classification, semantic segmentation, and object detection. However, less study has been devoted to the cross-window connection which is the key element to improve the representation ability. In this work, we revisit the spatial shuffle as an efficient way to build connections among windows. As a result, we propose a new vision transformer, named Shuffle Transformer, which is highly efficient and easy to implement by modifying two lines of code. Furthermore, the depth-wise convolution is introduced to complement the spatial shuffle for enhancing neighbor-window connections. The proposed architectures achieve excellent performance on a wide range of visual tasks including image-level classification, object detection, and semantic segmentation. Code will be released for reproduction.

PDF Abstract

Code

Add Remove Mark official

alibaba/EasyCV

1,677

BR-IDL/PaddleViT

1,185

mulinmeng/Shuffle-Transformer

mindspore-courses/External-Attentio…

Tasks

Add Remove

Image Classification

object-detection

Object Detection

Segmentation

Semantic Segmentation

Datasets

ImageNet

MS COCO

ADE20K

Results from the Paper

Edit

Ranked #45 on Semantic Segmentation on ADE20K val

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	ADE20K	UperNet Shuffle-T	Validation mIoU	47.6	# 152	Compare
Semantic Segmentation	ADE20K	UperNet Shuffle-B	Validation mIoU	50.5	# 104	Compare
Semantic Segmentation	ADE20K val	UperNet Shuffle-B	mIoU	50.5	# 45	Compare
Semantic Segmentation	ADE20K val	UperNet Shuffle-S	mIoU	49.6	# 53	Compare
Semantic Segmentation	ADE20K val	UperNet Shuffle-T	mIoU	47.6	# 61	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Convolution • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Shuffle-T • Softmax • Transformer

Edit Social Preview

Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove