TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Instance Segmentation	ADE20K val	OpenSeeD	AP	42.6	# 2
Panoptic Segmentation	ADE20K val	OpenSeed(SwinL, single scale, 1280x1280)	PQ	53.7	# 2
Instance Segmentation	Cityscapes val	OpenSeeD( SwinL, single-scale)	mask AP	49.3	# 2
Panoptic Segmentation	COCO minival	OpenSeeD (SwinL, single-scale)	PQ	59.5	# 2
Panoptic Segmentation	COCO minival	OpenSeeD (SwinL, single-scale)	AP	53.2	# 1
Zero Shot Segmentation	Segmentation in the Wild	OpenSEED	Mean AP	36.1	# 7

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-simple-framework-for-open-vocabulary/instance-segmentation-on-ade20k-val)](https://paperswithcode.com/sota/instance-segmentation-on-ade20k-val?p=a-simple-framework-for-open-vocabulary)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-simple-framework-for-open-vocabulary/panoptic-segmentation-on-ade20k-val)](https://paperswithcode.com/sota/panoptic-segmentation-on-ade20k-val?p=a-simple-framework-for-open-vocabulary)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-simple-framework-for-open-vocabulary/instance-segmentation-on-cityscapes-val)](https://paperswithcode.com/sota/instance-segmentation-on-cityscapes-val?p=a-simple-framework-for-open-vocabulary)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-simple-framework-for-open-vocabulary/panoptic-segmentation-on-coco-minival)](https://paperswithcode.com/sota/panoptic-segmentation-on-coco-minival?p=a-simple-framework-for-open-vocabulary)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-simple-framework-for-open-vocabulary/zero-shot-segmentation-on-segmentation-in-the)](https://paperswithcode.com/sota/zero-shot-segmentation-on-segmentation-in-the?p=a-simple-framework-for-open-vocabulary)`

A Simple Framework for Open-Vocabulary Segmentation and Detection

ICCV 2023 · Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang ·

We present OpenSeeD, a simple Open-vocabulary Segmentation and Detection framework that jointly learns from different segmentation and detection datasets. To bridge the gap of vocabulary and annotation granularity, we first introduce a pre-trained text encoder to encode all the visual concepts in two tasks and learn a common semantic space for them. This gives us reasonably good results compared with the counterparts trained on segmentation task only. To further reconcile them, we locate two discrepancies: $i$) task discrepancy -- segmentation requires extracting masks for both foreground objects and background stuff, while detection merely cares about the former; $ii$) data discrepancy -- box and mask annotations are with different spatial granularity, and thus not directly interchangeable. To address these issues, we propose a decoupled decoding to reduce the interference between foreground/background and a conditioned mask decoding to assist in generating masks for given boxes. To this end, we develop a simple encoder-decoder model encompassing all three techniques and train it jointly on COCO and Objects365. After pre-training, our model exhibits competitive or stronger zero-shot transferability for both segmentation and detection. Specifically, OpenSeeD beats the state-of-the-art method for open-vocabulary instance and panoptic segmentation across 5 datasets, and outperforms previous work for open-vocabulary detection on LVIS and ODinW under similar settings. When transferred to specific tasks, our model achieves new SoTA for panoptic segmentation on COCO and ADE20K, and instance segmentation on ADE20K and Cityscapes. Finally, we note that OpenSeeD is the first to explore the potential of joint training on segmentation and detection, and hope it can be received as a strong baseline for developing a single model for both tasks in open world.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Code

Add Remove Mark official

idea-research/openseed official

583

microsoft/X-Decoder

↳ Quickstart in

Spaces

1,243

Tasks

Add Remove

Instance Segmentation

Panoptic Segmentation

Segmentation

Semantic Segmentation

Zero Shot Segmentation

Datasets

MS COCO

Cityscapes

ADE20K

LVIS

Objects365

Segmentation in the Wild

Results from the Paper

Add Remove

Ranked #2 on Instance Segmentation on ADE20K val (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Instance Segmentation	ADE20K val	OpenSeeD	AP	42.6	# 2	Compare
Panoptic Segmentation	ADE20K val	OpenSeed(SwinL, single scale, 1280x1280)	PQ	53.7	# 2	Compare
Instance Segmentation	Cityscapes val	OpenSeeD( SwinL, single-scale)	mask AP	49.3	# 2	Compare
Panoptic Segmentation	COCO minival	OpenSeeD (SwinL, single-scale)	PQ	59.5	# 2	Compare
Panoptic Segmentation	COCO minival	OpenSeeD (SwinL, single-scale)	AP	53.2	# 1	Compare
Zero Shot Segmentation	Segmentation in the Wild	OpenSEED	Mean AP	36.1	# 7	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

A Simple Framework for Open-Vocabulary Segmentation and Detection

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove