Open Vocabulary Semantic Segmentation

28 papers with code • 9 benchmarks • 4 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Side Adapter Network for Open-Vocabulary Semantic Segmentation

mendelxu/san CVPR 2023

A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias which is applied in the CLIP model to recognize the class of masks.

CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation

KU-CVLAB/CAT-Seg 21 Mar 2023

However, the problem of transferring these capabilities learned from image-level supervision to the pixel-level task of segmentation and addressing arbitrary unseen categories at inference makes this task challenging.

A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model

mendelxu/zsseg.baseline 29 Dec 2021

However, semantic segmentation and the CLIP model perform on different visual granularity, that semantic segmentation processes on pixels while CLIP performs on images.

CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks

xmed-lab/clip_surgery 12 Apr 2023

Contrastive Language-Image Pre-training (CLIP) is a powerful multimodal large vision model that has demonstrated significant benefits for downstream tasks, including many zero-shot learning and text-guided vision tasks.

Panoptic Vision-Language Feature Fields

ethz-asl/autolabel 11 Sep 2023

In this paper, we propose to the best of our knowledge the first algorithm for open-vocabulary panoptic segmentation in 3D scenes.

Open-Vocabulary Universal Image Segmentation with MaskCLIP

mlpc-ucsd/maskclip 18 Aug 2022

In this paper, we tackle an emerging computer vision task, open-vocabulary universal image segmentation, that aims to perform semantic/instance/panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions in inference time.

Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP

facebookresearch/ov-seg CVPR 2023

To address this, we propose to finetune CLIP on a collection of masked image regions and their corresponding text descriptions.

Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models

chaofanma/fusioner 27 Oct 2022

When trained at a sufficient scale, self-supervised learning has exhibited a notable ability to solve a wide range of visual or language understanding tasks.

SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation

arrowluo/segclip 27 Nov 2022

The pre-trained model can capture enriched visual concepts for images by learning from a large scale of text-image data.