Open Vocabulary Semantic Segmentation

37 papers with code • 9 benchmarks • 4 datasets

Open-vocabulary semantic segmentation models aim to accurately assign a semantic label to each pixel in an image from a set of arbitrary open-vocabulary texts.

Benchmarks

Add a Result

These leaderboards are used to track progress in Open Vocabulary Semantic Segmentation

Dataset	Best Model	Compare
PASCAL Context-59	SILC	See all
ADE20K-150	CAT-Seg	See all
PascalVOC-20	SILC	See all
ADE20K-847	CAT-Seg	See all
PASCAL Context-459	SILC	See all
COCO-Stuff-171	POMP	See all
Cityscapes	FC-CLIP	See all
PascalVOC-20b	CAT-Seg	See all
Cityscape-171	PACL	See all

Datasets

Subtasks

Zero-Guidance Segmentation

Most implemented papers

Most implemented Social Latest No code

Side Adapter Network for Open-Vocabulary Semantic Segmentation

mendelxu/san • • CVPR 2023

A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias which is applied in the CLIP model to recognize the class of masks.

Paper
Code

CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation

KU-CVLAB/CAT-Seg • • 21 Mar 2023

Open-vocabulary semantic segmentation presents the challenge of labeling each pixel within an image based on a wide range of text descriptions.

Paper
Code

A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model

mendelxu/zsseg.baseline • • 29 Dec 2021

However, semantic segmentation and the CLIP model perform on different visual granularity, that semantic segmentation processes on pixels while CLIP performs on images.

Paper
Code

CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks

xmed-lab/clip_surgery • • 12 Apr 2023

Contrastive Language-Image Pre-training (CLIP) is a powerful multimodal large vision model that has demonstrated significant benefits for downstream tasks, including many zero-shot learning and text-guided vision tasks.

Paper
Code

Panoptic Vision-Language Feature Fields

ethz-asl/autolabel • • 11 Sep 2023

In this paper, we propose to the best of our knowledge the first algorithm for open-vocabulary panoptic segmentation in 3D scenes.

Paper
Code

Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation

sinahmr/naclip • • 12 Apr 2024

However, existing approaches often rely on impractical supervised pre-training or access to additional pre-trained networks.

Paper
Code

Decoupling Zero-Shot Semantic Segmentation

dingjiansw101/zegformer • • CVPR 2022

2) a zero-shot classification task on segments.

Paper
Code

Open-Vocabulary Universal Image Segmentation with MaskCLIP

mlpc-ucsd/maskclip • • 18 Aug 2022

In this paper, we tackle an emerging computer vision task, open-vocabulary universal image segmentation, that aims to perform semantic/instance/panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions in inference time.

Paper
Code

Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP

facebookresearch/ov-seg • • CVPR 2023

To address this, we propose to finetune CLIP on a collection of masked image regions and their corresponding text descriptions.

Paper
Code

Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models

chaofanma/fusioner • • 27 Oct 2022

When trained at a sufficient scale, self-supervised learning has exhibited a notable ability to solve a wide range of visual or language understanding tasks.

Paper
Code

Open Vocabulary Semantic Segmentation

Benchmarks Add a Result

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result