TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	ADE20K	Seg-B-Mask/16(MS, ViT-B)	Validation mIoU	50.0	# 114
Semantic Segmentation	ADE20K	Seg-L-Mask/16 (MS)	Validation mIoU	53.63	# 71
Semantic Segmentation	ADE20K	Seg-B/8 (MS, ViT-B)	Validation mIoU	49.61	# 121
Semantic Segmentation	ADE20K val	Seg-B/8 (MS, ViT-B)	mIoU	49.61	# 52
Semantic Segmentation	ADE20K val	Seg-B/8 (MS, ViT-B)	Pixel Accuracy	83.37	# 3
Semantic Segmentation	ADE20K val	Seg-B-Mask/16 (MS, ViT-B)	mIoU	50.0	# 49
Semantic Segmentation	ADE20K val	Seg-L-Mask/16 (MS, ViT-L)	mIoU	53.63	# 35
Semantic Segmentation	PASCAL Context	Seg-L-Mask/16	mIoU	59.0	# 15
Thermal Image Segmentation	RGB-T-Glass-Segmentation	Segmenter	MAE	0.072	# 17

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/segmenter-transformer-for-semantic/semantic-segmentation-on-pascal-context)](https://paperswithcode.com/sota/semantic-segmentation-on-pascal-context?p=segmenter-transformer-for-semantic)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/segmenter-transformer-for-semantic/thermal-image-segmentation-on-rgb-t-glass)](https://paperswithcode.com/sota/thermal-image-segmentation-on-rgb-t-glass?p=segmenter-transformer-for-semantic)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/segmenter-transformer-for-semantic/semantic-segmentation-on-ade20k-val)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k-val?p=segmenter-transformer-for-semantic)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/segmenter-transformer-for-semantic/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=segmenter-transformer-for-semantic)`

Segmenter: Transformer for Semantic Segmentation

ICCV 2021 · Robin Strudel, Ricardo Garcia, Ivan Laptev, Cordelia Schmid ·

Image segmentation is often ambiguous at the level of individual image patches and requires contextual information to reach label consensus. In this paper we introduce Segmenter, a transformer model for semantic segmentation. In contrast to convolution-based methods, our approach allows to model global context already at the first layer and throughout the network. We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation. To do so, we rely on the output embeddings corresponding to image patches and obtain class labels from these embeddings with a point-wise linear decoder or a mask transformer decoder. We leverage models pre-trained for image classification and show that we can fine-tune them on moderate sized datasets available for semantic segmentation. The linear decoder allows to obtain excellent results already, but the performance can be further improved by a mask transformer generating class masks. We conduct an extensive ablation study to show the impact of the different parameters, in particular the performance is better for large models and small patch sizes. Segmenter attains excellent results for semantic segmentation. It outperforms the state of the art on both ADE20K and Pascal Context datasets and is competitive on Cityscapes.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Code

Add Remove Mark official

rstrudel/segmenter official

792

PaddlePaddle/PaddleSeg

8,228

BR-IDL/PaddleViT

1,183

EricKani/Segmenter-Based-on-OpenMML…

tue-mps/cts-segmenter

See all 7 implementations

Tasks

Add Remove

Image Classification

Image Segmentation

Segmentation

Semantic Segmentation

Thermal Image Segmentation

Datasets

ImageNet

Cityscapes

ADE20K

PASCAL Context

Results from the Paper

Edit

Ranked #15 on Semantic Segmentation on PASCAL Context

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	ADE20K	Seg-B-Mask/16(MS, ViT-B)	Validation mIoU	50.0	# 114	Compare
Semantic Segmentation	ADE20K	Seg-L-Mask/16 (MS)	Validation mIoU	53.63	# 71	Compare
Semantic Segmentation	ADE20K	Seg-B/8 (MS, ViT-B)	Validation mIoU	49.61	# 121	Compare
Semantic Segmentation	ADE20K val	Seg-B/8 (MS, ViT-B)	mIoU	49.61	# 52	Compare
Semantic Segmentation	ADE20K val	Seg-B/8 (MS, ViT-B)	Pixel Accuracy	83.37	# 3	Compare
Semantic Segmentation	ADE20K val	Seg-B-Mask/16 (MS, ViT-B)	mIoU	50.0	# 49	Compare
Semantic Segmentation	ADE20K val	Seg-L-Mask/16 (MS, ViT-L)	mIoU	53.63	# 35	Compare
Semantic Segmentation	PASCAL Context	Seg-L-Mask/16	mIoU	59.0	# 15	Compare
Thermal Image Segmentation	RGB-T-Glass-Segmentation	Segmenter	MAE	0.072	# 17	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Convolution • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer • Vision Transformer

Edit Social Preview

Segmenter: Transformer for Semantic Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove