TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	ADE20K	Mask2Former(Swin-B)	Validation mIoU	55.1	# 44
Semantic Segmentation	ADE20K	Mask2Former (SwinL)	Validation mIoU	57.3	# 27
Semantic Segmentation	ADE20K	Mask2Former (SwinL-FaPN)	Validation mIoU	57.7	# 21
Semantic Segmentation	ADE20K	Mask2Former (Swin-L-FaPN)	Validation mIoU	56.4	# 34
Panoptic Segmentation	ADE20K val	Mask2Former (Swin-L)	PQ	48.1	# 15
Panoptic Segmentation	ADE20K val	Mask2Former (Swin-L)	AP	34.2	# 11
Panoptic Segmentation	ADE20K val	Mask2Former (Swin-L)	mIoU	54.5	# 15
Semantic Segmentation	ADE20K val	Mask2Former (Swin-L-FaPN)	mIoU	56.4	# 23
Panoptic Segmentation	ADE20K val	Mask2Former (ResNet-50, 640x640)	AP	26.5	# 13
Panoptic Segmentation	ADE20K val	Mask2Former (ResNet-50, 640x640)	mIoU	46.1	# 17
Panoptic Segmentation	ADE20K val	Mask2Former (ResNet-50, 640x640)	PQ	39.7	# 19
Panoptic Segmentation	ADE20K val	Mask2Former (Swin-L + FAPN, 640x640)	PQ	46.2	# 16
Panoptic Segmentation	ADE20K val	Mask2Former (Swin-L + FAPN, 640x640)	AP	33.2	# 12
Panoptic Segmentation	ADE20K val	Mask2Former (Swin-L + FAPN, 640x640)	mIoU	55.4	# 12
Instance Segmentation	ADE20K val	Mask2Former (Swin-L, single-scale)	AP	34.9	# 9
Instance Segmentation	ADE20K val	Mask2Former (Swin-L, single-scale)	APS	16.3	# 4
Instance Segmentation	ADE20K val	Mask2Former (Swin-L, single-scale)	APM	40	# 4
Instance Segmentation	ADE20K val	Mask2Former (Swin-L, single-scale)	APL	54.7	# 5
Panoptic Segmentation	ADE20K val	Panoptic-DeepLab (SwideRNet)	PQ	37.9	# 20
Panoptic Segmentation	ADE20K val	Panoptic-DeepLab (SwideRNet)	mIoU	50	# 16
Instance Segmentation	ADE20K val	Mask2Former (ResNet-50)	APM	28.9	# 7
Instance Segmentation	ADE20K val	Mask2Former (ResNet-50)	APL	43.1	# 7
Instance Segmentation	ADE20K val	Mask2Former (Swin-L + FAPN)	AP	33.4	# 10
Instance Segmentation	ADE20K val	Mask2Former (Swin-L + FAPN)	APS	14.6	# 6
Instance Segmentation	ADE20K val	Mask2Former (Swin-L + FAPN)	APM	37.6	# 6
Instance Segmentation	ADE20K val	Mask2Former (Swin-L + FAPN)	APL	54.6	# 6
Instance Segmentation	ADE20K val	Mask2Former (ResNet50)	AP	26.4	# 11
Instance Segmentation	ADE20K val	Mask2Former (ResNet50)	APS	10.4	# 7
Semantic Segmentation	ADE20K val	Mask2Former (Swin-L-FaPN, multiscale)	mIoU	57.7	# 16
Instance Segmentation	Cityscapes val	Mask2Former (Swin-L, single-scale)	mask AP	43.7	# 9
Semantic Segmentation	Cityscapes val	Mask2Former (Swin-L)	mIoU	84.3	# 15
Panoptic Segmentation	Cityscapes val	Mask2Former (Swin-L)	PQ	66.6	# 14
Panoptic Segmentation	Cityscapes val	Mask2Former (Swin-L)	mIoU	82.9	# 13
Panoptic Segmentation	Cityscapes val	Mask2Former (Swin-L)	AP	43.6	# 12
Instance Segmentation	Cityscapes val	Mask2Former (Swin-B)	mask AP	42	# 10
Instance Segmentation	Cityscapes val	Mask2Former (Swin-S)	mask AP	41.8	# 11
Instance Segmentation	Cityscapes val	Mask2Former (Swin-T)	mask AP	39.7	# 13
Instance Segmentation	Cityscapes val	Mask2Former (ResNet-101)	mask AP	38.5	# 14
Instance Segmentation	Cityscapes val	Mask2Former (ResNet-50)	mask AP	37.4	# 15
Panoptic Segmentation	COCO minival	Mask2Former (single-scale)	PQ	57.8	# 14
Panoptic Segmentation	COCO minival	Mask2Former (single-scale)	PQth	64.2	# 8
Panoptic Segmentation	COCO minival	Mask2Former (single-scale)	PQst	48.1	# 9
Panoptic Segmentation	COCO minival	Mask2Former (single-scale)	AP	48.6	# 8
Instance Segmentation	COCO minival	Mask2Former (Swin-L)	mask AP	50.1	# 24
Instance Segmentation	COCO test-dev	Mask2Former (Swin-L, single scale)	mask AP	50.5	# 19
Instance Segmentation	COCO test-dev	Mask2Former (Swin-L, single scale)	AP50	74.9	# 6
Instance Segmentation	COCO test-dev	Mask2Former (Swin-L, single scale)	AP75	54.9	# 5
Instance Segmentation	COCO test-dev	Mask2Former (Swin-L, single scale)	APS	29.1	# 10
Instance Segmentation	COCO test-dev	Mask2Former (Swin-L, single scale)	APM	53.8	# 5
Instance Segmentation	COCO test-dev	Mask2Former (Swin-L, single scale)	APL	71.2	# 2
Panoptic Segmentation	COCO test-dev	Mask2Former (Swin-L)	PQ	58.3	# 3
Panoptic Segmentation	COCO test-dev	Mask2Former (Swin-L)	PQst	48.1	# 3
Panoptic Segmentation	COCO test-dev	Mask2Former (Swin-L)	PQth	65.1	# 1
Instance Segmentation	COCO val (panoptic labels)	Mask2Former (Swin-L, single-scale)	AP	49.1	# 3
Semantic Segmentation	Mapillary val	Mask2Former (Swin-L, multiscale)	mIoU	64.7	# 3
Semantic Segmentation	MS COCO	MaskFormer (Swin-L, single-scale)	mIoU	64.8	# 5
Semantic Segmentation	MS COCO	Mask2Former (Swin-L, single-scale)	mIoU	67.4	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/masked-attention-mask-transformer-for/panoptic-segmentation-on-coco-test-dev)](https://paperswithcode.com/sota/panoptic-segmentation-on-coco-test-dev?p=masked-attention-mask-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/masked-attention-mask-transformer-for/instance-segmentation-on-coco-val-panoptic)](https://paperswithcode.com/sota/instance-segmentation-on-coco-val-panoptic?p=masked-attention-mask-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/masked-attention-mask-transformer-for/semantic-segmentation-on-mapillary-val)](https://paperswithcode.com/sota/semantic-segmentation-on-mapillary-val?p=masked-attention-mask-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/masked-attention-mask-transformer-for/semantic-segmentation-on-coco-1)](https://paperswithcode.com/sota/semantic-segmentation-on-coco-1?p=masked-attention-mask-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/masked-attention-mask-transformer-for/instance-segmentation-on-ade20k-val)](https://paperswithcode.com/sota/instance-segmentation-on-ade20k-val?p=masked-attention-mask-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/masked-attention-mask-transformer-for/instance-segmentation-on-cityscapes-val)](https://paperswithcode.com/sota/instance-segmentation-on-cityscapes-val?p=masked-attention-mask-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/masked-attention-mask-transformer-for/panoptic-segmentation-on-ade20k-val)](https://paperswithcode.com/sota/panoptic-segmentation-on-ade20k-val?p=masked-attention-mask-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/masked-attention-mask-transformer-for/panoptic-segmentation-on-cityscapes-val)](https://paperswithcode.com/sota/panoptic-segmentation-on-cityscapes-val?p=masked-attention-mask-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/masked-attention-mask-transformer-for/panoptic-segmentation-on-coco-minival)](https://paperswithcode.com/sota/panoptic-segmentation-on-coco-minival?p=masked-attention-mask-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/masked-attention-mask-transformer-for/semantic-segmentation-on-cityscapes-val)](https://paperswithcode.com/sota/semantic-segmentation-on-cityscapes-val?p=masked-attention-mask-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/masked-attention-mask-transformer-for/semantic-segmentation-on-ade20k-val)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k-val?p=masked-attention-mask-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/masked-attention-mask-transformer-for/instance-segmentation-on-coco)](https://paperswithcode.com/sota/instance-segmentation-on-coco?p=masked-attention-mask-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/masked-attention-mask-transformer-for/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=masked-attention-mask-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/masked-attention-mask-transformer-for/instance-segmentation-on-coco-minival)](https://paperswithcode.com/sota/instance-segmentation-on-coco-minival?p=masked-attention-mask-transformer-for)`

Masked-attention Mask Transformer for Universal Image Segmentation

CVPR 2022 · Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar ·

Image segmentation is about grouping pixels with different semantics, e.g., category or instance membership, where each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing specialized architectures for each task. We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic). Its key components include masked attention, which extracts localized features by constraining cross-attention within predicted mask regions. In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets. Most notably, Mask2Former sets a new state-of-the-art for panoptic segmentation (57.8 PQ on COCO), instance segmentation (50.1 AP on COCO) and semantic segmentation (57.7 mIoU on ADE20K).

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Code

Add Remove Mark official

facebookresearch/Mask2Former official

↳ Quickstart in

Colab

Spaces

Replicate

2,201

huggingface/transformers

124,527

open-mmlab/mmdetection

27,708

alibaba/EasyCV

1,671

DdeGeus/Mask2Former-IBS

See all 6 implementations

Tasks

Add Remove

Image Segmentation

Instance Segmentation

Panoptic Segmentation

Segmentation

Semantic Segmentation

Universal Segmentation

Datasets

MS COCO

Cityscapes

ADE20K

Mapillary Vistas Dataset

Results from the Paper

Edit

Ranked #3 on Semantic Segmentation on Mapillary val

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	ADE20K	Mask2Former(Swin-B)	Validation mIoU	55.1	# 44	Compare
Semantic Segmentation	ADE20K	Mask2Former (SwinL)	Validation mIoU	57.3	# 27	Compare
Semantic Segmentation	ADE20K	Mask2Former (SwinL-FaPN)	Validation mIoU	57.7	# 21	Compare
Semantic Segmentation	ADE20K	Mask2Former (Swin-L-FaPN)	Validation mIoU	56.4	# 34	Compare
Panoptic Segmentation	ADE20K val	Mask2Former (Swin-L)	PQ	48.1	# 15	Compare
			AP	34.2	# 11	Compare
			mIoU	54.5	# 15	Compare
Semantic Segmentation	ADE20K val	Mask2Former (Swin-L-FaPN)	mIoU	56.4	# 23	Compare
Panoptic Segmentation	ADE20K val	Mask2Former (ResNet-50, 640x640)	AP	26.5	# 13	Compare
			mIoU	46.1	# 17	Compare
			PQ	39.7	# 19	Compare
Panoptic Segmentation	ADE20K val	Mask2Former (Swin-L + FAPN, 640x640)	PQ	46.2	# 16	Compare
			AP	33.2	# 12	Compare
			mIoU	55.4	# 12	Compare
Instance Segmentation	ADE20K val	Mask2Former (Swin-L, single-scale)	AP	34.9	# 9	Compare
			APS	16.3	# 4	Compare
			APM	40	# 4	Compare
			APL	54.7	# 5	Compare
Panoptic Segmentation	ADE20K val	Panoptic-DeepLab (SwideRNet)	PQ	37.9	# 20	Compare
Panoptic Segmentation	ADE20K val	Panoptic-DeepLab (SwideRNet)	mIoU	50	# 16	Compare
Instance Segmentation	ADE20K val	Mask2Former (ResNet-50)	APM	28.9	# 7	Compare
Instance Segmentation	ADE20K val	Mask2Former (ResNet-50)	APL	43.1	# 7	Compare
Instance Segmentation	ADE20K val	Mask2Former (Swin-L + FAPN)	AP	33.4	# 10	Compare
			APS	14.6	# 6	Compare
			APM	37.6	# 6	Compare
			APL	54.6	# 6	Compare
Instance Segmentation	ADE20K val	Mask2Former (ResNet50)	AP	26.4	# 11	Compare
Instance Segmentation	ADE20K val	Mask2Former (ResNet50)	APS	10.4	# 7	Compare
Semantic Segmentation	ADE20K val	Mask2Former (Swin-L-FaPN, multiscale)	mIoU	57.7	# 16	Compare
Instance Segmentation	Cityscapes val	Mask2Former (Swin-L, single-scale)	mask AP	43.7	# 9	Compare
Semantic Segmentation	Cityscapes val	Mask2Former (Swin-L)	mIoU	84.3	# 15	Compare
Panoptic Segmentation	Cityscapes val	Mask2Former (Swin-L)	PQ	66.6	# 14	Compare
			mIoU	82.9	# 13	Compare
			AP	43.6	# 12	Compare
Instance Segmentation	Cityscapes val	Mask2Former (Swin-B)	mask AP	42	# 10	Compare
Instance Segmentation	Cityscapes val	Mask2Former (Swin-S)	mask AP	41.8	# 11	Compare
Instance Segmentation	Cityscapes val	Mask2Former (Swin-T)	mask AP	39.7	# 13	Compare
Instance Segmentation	Cityscapes val	Mask2Former (ResNet-101)	mask AP	38.5	# 14	Compare
Instance Segmentation	Cityscapes val	Mask2Former (ResNet-50)	mask AP	37.4	# 15	Compare
Panoptic Segmentation	COCO minival	Mask2Former (single-scale)	PQ	57.8	# 14	Compare
			PQth	64.2	# 8	Compare
			PQst	48.1	# 9	Compare
			AP	48.6	# 8	Compare
Instance Segmentation	COCO minival	Mask2Former (Swin-L)	mask AP	50.1	# 24	Compare
Instance Segmentation	COCO test-dev	Mask2Former (Swin-L, single scale)	mask AP	50.5	# 19	Compare
			AP50	74.9	# 6	Compare
			AP75	54.9	# 5	Compare
			APS	29.1	# 10	Compare
			APM	53.8	# 5	Compare
			APL	71.2	# 2	Compare
Panoptic Segmentation	COCO test-dev	Mask2Former (Swin-L)	PQ	58.3	# 3	Compare
			PQst	48.1	# 3	Compare
			PQth	65.1	# 1	Compare
Instance Segmentation	COCO val (panoptic labels)	Mask2Former (Swin-L, single-scale)	AP	49.1	# 3	Compare
Semantic Segmentation	Mapillary val	Mask2Former (Swin-L, multiscale)	mIoU	64.7	# 3	Compare
Semantic Segmentation	MS COCO	MaskFormer (Swin-L, single-scale)	mIoU	64.8	# 5	Compare
Semantic Segmentation	MS COCO	Mask2Former (Swin-L, single-scale)	mIoU	67.4	# 3	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Masked-attention Mask Transformer for Universal Image Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove