TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	ADE20K	RSSeg-ViT-L (BEiT pretrain)	Validation mIoU	58.4	# 14
Semantic Segmentation	ADE20K	RSSeg-ViT-L (BEiT pretrain)	Params (M)	330	# 14
Semantic Segmentation	ADE20K val	RSSeg-ViT-L(BEiT pretrain)	mIoU	58.4	# 9
Semantic Segmentation	COCO-Stuff test	RSSeg-ViT-L	mIoU	52.0%	# 3
Semantic Segmentation	COCO-Stuff test	RSSeg-ViT-L (BEiT pretrain)	mIoU	52.6%	# 2
Semantic Segmentation	PASCAL Context	RSSeg-ViT-L	mIoU	67.5	# 5
Semantic Segmentation	PASCAL Context	RSSeg-ViT-L (BEiT pretrain)	mIoU	68.9	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/representation-separation-for-semantic/semantic-segmentation-on-coco-stuff-test)](https://paperswithcode.com/sota/semantic-segmentation-on-coco-stuff-test?p=representation-separation-for-semantic)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/representation-separation-for-semantic/semantic-segmentation-on-pascal-context)](https://paperswithcode.com/sota/semantic-segmentation-on-pascal-context?p=representation-separation-for-semantic)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/representation-separation-for-semantic/semantic-segmentation-on-ade20k-val)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k-val?p=representation-separation-for-semantic)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/representation-separation-for-semantic/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=representation-separation-for-semantic)`

Representation Separation for Semantic Segmentation with Vision Transformers

28 Dec 2022 · Yuanduo Hong, Huihui Pan, Weichao Sun, Xinghu Yu, Huijun Gao ·

Vision transformers (ViTs) encoding an image as a sequence of patches bring new paradigms for semantic segmentation.We present an efficient framework of representation separation in local-patch level and global-region level for semantic segmentation with ViTs. It is targeted for the peculiar over-smoothness of ViTs in semantic segmentation, and therefore differs from current popular paradigms of context modeling and most existing related methods reinforcing the advantage of attention. We first deliver the decoupled two-pathway network in which another pathway enhances and passes down local-patch discrepancy complementary to global representations of transformers. We then propose the spatially adaptive separation module to obtain more separate deep representations and the discriminative cross-attention which yields more discriminative region representations through novel auxiliary supervisions. The proposed methods achieve some impressive results: 1) incorporated with large-scale plain ViTs, our methods achieve new state-of-the-art performances on five widely used benchmarks; 2) using masked pre-trained plain ViTs, we achieve 68.9% mIoU on Pascal Context, setting a new record; 3) pyramid ViTs integrated with the decoupled two-pathway network even surpass the well-designed high-resolution ViTs on Cityscapes; 4) the improved representations by our framework have favorable transferability in images with natural corruptions. The codes will be released publicly.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Semantic Segmentation

Datasets

ImageNet

Cityscapes

ADE20K

PASCAL Context

COCO-Stuff

Results from the Paper

Edit

Ranked #2 on Semantic Segmentation on COCO-Stuff test

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	ADE20K	RSSeg-ViT-L (BEiT pretrain)	Validation mIoU	58.4	# 14	Compare
Semantic Segmentation	ADE20K	RSSeg-ViT-L (BEiT pretrain)	Params (M)	330	# 14	Compare
Semantic Segmentation	ADE20K val	RSSeg-ViT-L(BEiT pretrain)	mIoU	58.4	# 9	Compare
Semantic Segmentation	COCO-Stuff test	RSSeg-ViT-L	mIoU	52.0%	# 3	Compare
Semantic Segmentation	COCO-Stuff test	RSSeg-ViT-L (BEiT pretrain)	mIoU	52.6%	# 2	Compare
Semantic Segmentation	PASCAL Context	RSSeg-ViT-L	mIoU	67.5	# 5	Compare
Semantic Segmentation	PASCAL Context	RSSeg-ViT-L (BEiT pretrain)	mIoU	68.9	# 3	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Representation Separation for Semantic Segmentation with Vision Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove