TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Open Vocabulary Semantic Segmentation	ADE20K-150	MaskCLIP	mIoU	23.7	# 12
Open Vocabulary Semantic Segmentation	ADE20K-847	MaskCLIP	mIoU	8.2	# 12
Open Vocabulary Semantic Segmentation	PASCAL Context-459	MaskCLIP	mIoU	10	# 10
Open Vocabulary Semantic Segmentation	PASCAL Context-59	MaskCLIP	mIoU	45.9	# 13

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/open-vocabulary-panoptic-segmentation-with/open-vocabulary-semantic-segmentation-on-7)](https://paperswithcode.com/sota/open-vocabulary-semantic-segmentation-on-7?p=open-vocabulary-panoptic-segmentation-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/open-vocabulary-panoptic-segmentation-with/open-vocabulary-semantic-segmentation-on-2)](https://paperswithcode.com/sota/open-vocabulary-semantic-segmentation-on-2?p=open-vocabulary-panoptic-segmentation-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/open-vocabulary-panoptic-segmentation-with/open-vocabulary-semantic-segmentation-on-3)](https://paperswithcode.com/sota/open-vocabulary-semantic-segmentation-on-3?p=open-vocabulary-panoptic-segmentation-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/open-vocabulary-panoptic-segmentation-with/open-vocabulary-semantic-segmentation-on-1)](https://paperswithcode.com/sota/open-vocabulary-semantic-segmentation-on-1?p=open-vocabulary-panoptic-segmentation-with)`

Open-Vocabulary Universal Image Segmentation with MaskCLIP

18 Aug 2022 · Zheng Ding, Jieke Wang, Zhuowen Tu ·

In this paper, we tackle an emerging computer vision task, open-vocabulary universal image segmentation, that aims to perform semantic/instance/panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions in inference time. We first build a baseline method by directly adopting pre-trained CLIP models without finetuning or distillation. We then develop MaskCLIP, a Transformer-based approach with a MaskCLIP Visual Encoder, which is an encoder-only module that seamlessly integrates mask tokens with a pre-trained ViT CLIP model for semantic/instance segmentation and class prediction. MaskCLIP learns to efficiently and effectively utilize pre-trained partial/dense CLIP features within the MaskCLIP Visual Encoder that avoids the time-consuming student-teacher training process. MaskCLIP outperforms previous methods for semantic/instance/panoptic segmentation on ADE20K and PASCAL datasets. We show qualitative illustrations for MaskCLIP with online custom categories. Project website: https://maskclip.github.io.

PDF Abstract

Code

Add Remove Mark official

mlpc-ucsd/maskclip official

Tasks

Add Remove

Image Segmentation

Instance Segmentation

Open Vocabulary Panoptic Segmentation

Open Vocabulary Semantic Segmentation

Panoptic Segmentation

Segmentation

Semantic Segmentation

Datasets

MS COCO

ADE20K

PASCAL Context

Results from the Paper

Edit

Ranked #10 on Open Vocabulary Semantic Segmentation on PASCAL Context-459

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Open Vocabulary Semantic Segmentation	ADE20K-150	MaskCLIP	mIoU	23.7	# 12	Compare
Open Vocabulary Semantic Segmentation	ADE20K-847	MaskCLIP	mIoU	8.2	# 12	Compare
Open Vocabulary Semantic Segmentation	PASCAL Context-459	MaskCLIP	mIoU	10	# 10	Compare
Open Vocabulary Semantic Segmentation	PASCAL Context-59	MaskCLIP	mIoU	45.9	# 13	Compare

Methods

Add Remove

CLIP

Edit Social Preview

Open-Vocabulary Universal Image Segmentation with MaskCLIP

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove