TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Panoptic Segmentation	ADE20K val	kMaX-DeepLab (ResNet50, single-scale, 1281x1281)	PQ	42.3	# 17
Panoptic Segmentation	ADE20K val	kMaX-DeepLab (ResNet50, single-scale, 1281x1281)	AP	-	# 14
Panoptic Segmentation	ADE20K val	kMaX-DeepLab (ResNet50, single-scale, 1281x1281)	mIoU	45.3	# 18
Panoptic Segmentation	ADE20K val	kMaX-DeepLab (ConvNeXt-L, single-scale, 641x641)	PQ	48.7	# 14
Panoptic Segmentation	ADE20K val	kMaX-DeepLab (ConvNeXt-L, single-scale, 641x641)	AP	-	# 14
Panoptic Segmentation	ADE20K val	kMaX-DeepLab (ConvNeXt-L, single-scale, 641x641)	mIoU	54.8	# 14
Panoptic Segmentation	ADE20K val	kMaX-DeepLab (ConvNeXt-L, single-scale, 1281x1281)	PQ	50.9	# 7
Panoptic Segmentation	ADE20K val	kMaX-DeepLab (ConvNeXt-L, single-scale, 1281x1281)	AP	-	# 14
Panoptic Segmentation	ADE20K val	kMaX-DeepLab (ConvNeXt-L, single-scale, 1281x1281)	mIoU	55.2	# 13
Panoptic Segmentation	ADE20K val	kMaX-DeepLab (ResNet50, single-scale, 641x641)	PQ	41.5	# 18
Panoptic Segmentation	ADE20K val	kMaX-DeepLab (ResNet50, single-scale, 641x641)	AP	-	# 14
Panoptic Segmentation	ADE20K val	kMaX-DeepLab (ResNet50, single-scale, 641x641)	mIoU	45.0	# 19
Panoptic Segmentation	Cityscapes test	kMaX-DeepLab (single-scale)	PQ	66.2	# 5
Semantic Segmentation	Cityscapes test	kMaX-DeepLab (ConvNeXt-L, fine only)	Mean IoU (class)	83.2%	# 17
Panoptic Segmentation	Cityscapes val	kMaX-DeepLab (single-scale)	PQ	68.4	# 6
Panoptic Segmentation	Cityscapes val	kMaX-DeepLab (single-scale)	mIoU	83.5	# 7
Panoptic Segmentation	Cityscapes val	kMaX-DeepLab (single-scale)	AP	44.0	# 11
Panoptic Segmentation	COCO minival	kMaX-DeepLab (single-scale, drop query with 256 queries)	PQ	58.0	# 9
Panoptic Segmentation	COCO minival	kMaX-DeepLab (single-scale, drop query with 256 queries)	PQth	64.2	# 8
Panoptic Segmentation	COCO minival	kMaX-DeepLab (single-scale, drop query with 256 queries)	PQst	48.6	# 4
Panoptic Segmentation	COCO minival	kMaX-DeepLab (single-scale)	PQ	57.9	# 11
Panoptic Segmentation	COCO minival	kMaX-DeepLab (single-scale)	PQth	64.0	# 10
Panoptic Segmentation	COCO minival	kMaX-DeepLab (single-scale)	PQst	48.6	# 4
Panoptic Segmentation	COCO minival	kMaX-DeepLab (single-scale, pseudo-labels)	PQ	58.1	# 7
Panoptic Segmentation	COCO minival	kMaX-DeepLab (single-scale, pseudo-labels)	PQth	64.3	# 6
Panoptic Segmentation	COCO minival	kMaX-DeepLab (single-scale, pseudo-labels)	PQst	48.8	# 2
Panoptic Segmentation	COCO test-dev	kMaX-DeepLab (single-scale)	PQ	58.5	# 2
Panoptic Segmentation	COCO test-dev	kMaX-DeepLab (single-scale)	PQst	49.0	# 2
Panoptic Segmentation	COCO test-dev	kMaX-DeepLab (single-scale)	PQth	64.8	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/k-means-mask-transformer/panoptic-segmentation-on-coco-test-dev)](https://paperswithcode.com/sota/panoptic-segmentation-on-coco-test-dev?p=k-means-mask-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/k-means-mask-transformer/panoptic-segmentation-on-cityscapes-test)](https://paperswithcode.com/sota/panoptic-segmentation-on-cityscapes-test?p=k-means-mask-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/k-means-mask-transformer/panoptic-segmentation-on-cityscapes-val)](https://paperswithcode.com/sota/panoptic-segmentation-on-cityscapes-val?p=k-means-mask-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/k-means-mask-transformer/panoptic-segmentation-on-ade20k-val)](https://paperswithcode.com/sota/panoptic-segmentation-on-ade20k-val?p=k-means-mask-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/k-means-mask-transformer/panoptic-segmentation-on-coco-minival)](https://paperswithcode.com/sota/panoptic-segmentation-on-coco-minival?p=k-means-mask-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/k-means-mask-transformer/semantic-segmentation-on-cityscapes)](https://paperswithcode.com/sota/semantic-segmentation-on-cityscapes?p=k-means-mask-transformer)`

kMaX-DeepLab: k-means Mask Transformer

8 Jul 2022 · Qihang Yu, Huiyu Wang, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen ·

The rise of transformers in vision tasks not only advances network backbone designs, but also starts a brand-new page to achieve end-to-end image recognition (e.g., object detection and panoptic segmentation). Originated from Natural Language Processing (NLP), transformer architectures, consisting of self-attention and cross-attention, effectively learn long-range interactions between elements in a sequence. However, we observe that most existing transformer-based vision models simply borrow the idea from NLP, neglecting the crucial difference between languages and images, particularly the extremely large sequence length of spatially flattened pixel features. This subsequently impedes the learning in cross-attention between pixel features and object queries. In this paper, we rethink the relationship between pixels and object queries and propose to reformulate the cross-attention learning as a clustering process. Inspired by the traditional k-means clustering algorithm, we develop a k-means Mask Xformer (kMaX-DeepLab) for segmentation tasks, which not only improves the state-of-the-art, but also enjoys a simple and elegant design. As a result, our kMaX-DeepLab achieves a new state-of-the-art performance on COCO val set with 58.0% PQ, Cityscapes val set with 68.4% PQ, 44.0% AP, and 83.5% mIoU, and ADE20K val set with 50.9% PQ and 55.2% mIoU without test-time augmentation or external dataset. We hope our work can shed some light on designing transformers tailored for vision tasks. TensorFlow code and models are available at https://github.com/google-research/deeplab2 A PyTorch re-implementation is also available at https://github.com/bytedance/kmax-deeplab

PDF Abstract

Code

Add Remove Mark official

google-research/deeplab2 official

↳ Quickstart in

Colab

988

bytedance/kmax-deeplab official

↳ Quickstart in

Spaces

Tasks

Add Remove

Clustering

Object Detection

Panoptic Segmentation

Semantic Segmentation

Datasets

MS COCO

Cityscapes

ADE20K

Results from the Paper

Edit

Ranked #2 on Panoptic Segmentation on COCO test-dev

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Panoptic Segmentation	ADE20K val	kMaX-DeepLab (ResNet50, single-scale, 1281x1281)	PQ	42.3	# 17	Compare
			AP	-	# 14	Compare
			mIoU	45.3	# 18	Compare
Panoptic Segmentation	ADE20K val	kMaX-DeepLab (ConvNeXt-L, single-scale, 641x641)	PQ	48.7	# 14	Compare
			AP	-	# 14	Compare
			mIoU	54.8	# 14	Compare
Panoptic Segmentation	ADE20K val	kMaX-DeepLab (ConvNeXt-L, single-scale, 1281x1281)	PQ	50.9	# 7	Compare
			AP	-	# 14	Compare
			mIoU	55.2	# 13	Compare
Panoptic Segmentation	ADE20K val	kMaX-DeepLab (ResNet50, single-scale, 641x641)	PQ	41.5	# 18	Compare
			AP	-	# 14	Compare
			mIoU	45.0	# 19	Compare
Panoptic Segmentation	Cityscapes test	kMaX-DeepLab (single-scale)	PQ	66.2	# 5	Compare
Semantic Segmentation	Cityscapes test	kMaX-DeepLab (ConvNeXt-L, fine only)	Mean IoU (class)	83.2%	# 17	Compare
Panoptic Segmentation	Cityscapes val	kMaX-DeepLab (single-scale)	PQ	68.4	# 6	Compare
			mIoU	83.5	# 7	Compare
			AP	44.0	# 11	Compare
Panoptic Segmentation	COCO minival	kMaX-DeepLab (single-scale, drop query with 256 queries)	PQ	58.0	# 9	Compare
			PQth	64.2	# 8	Compare
			PQst	48.6	# 4	Compare
Panoptic Segmentation	COCO minival	kMaX-DeepLab (single-scale)	PQ	57.9	# 11	Compare
			PQth	64.0	# 10	Compare
			PQst	48.6	# 4	Compare
Panoptic Segmentation	COCO minival	kMaX-DeepLab (single-scale, pseudo-labels)	PQ	58.1	# 7	Compare
			PQth	64.3	# 6	Compare
			PQst	48.8	# 2	Compare
Panoptic Segmentation	COCO test-dev	kMaX-DeepLab (single-scale)	PQ	58.5	# 2	Compare
			PQst	49.0	# 2	Compare
			PQth	64.8	# 2	Compare

Methods

Add Remove

k-Means Clustering

Edit Social Preview

kMaX-DeepLab: k-means Mask Transformer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove