Open Vocabulary Semantic Segmentation

55 papers with code • 9 benchmarks • 4 datasets

Open-vocabulary semantic segmentation models aim to accurately assign a semantic label to each pixel in an image from a set of arbitrary open-vocabulary texts.

Most implemented papers

Side Adapter Network for Open-Vocabulary Semantic Segmentation

mendelxu/san CVPR 2023

A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias which is applied in the CLIP model to recognize the class of masks.

CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation

KU-CVLAB/CAT-Seg CVPR 2024

Open-vocabulary semantic segmentation presents the challenge of labeling each pixel within an image based on a wide range of text descriptions.

A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model

mendelxu/zsseg.baseline 29 Dec 2021

However, semantic segmentation and the CLIP model perform on different visual granularity, that semantic segmentation processes on pixels while CLIP performs on images.

A Closer Look at the Explainability of Contrastive Language-Image Pre-training

xmed-lab/clip_surgery 12 Apr 2023

These phenomena conflict with conventional explainability methods based on the class attention map (CAM), where the raw model can highlight the local foreground regions using global supervision without alignment.

Panoptic Vision-Language Feature Fields

ethz-asl/autolabel 11 Sep 2023

In this paper, we propose to the best of our knowledge the first algorithm for open-vocabulary panoptic segmentation in 3D scenes.

Learning Mask-aware CLIP Representations for Zero-Shot Segmentation

jiaosiyu1999/maft NeurIPS 2023

However, in the paper, we reveal that CLIP is insensitive to different mask proposals and tends to produce similar predictions for various mask proposals of the same image.

Open-Vocabulary Segmentation with Semantic-Assisted Calibration

yongliu20/SCAN CVPR 2024

We attribute this to the in-vocabulary embedding and domain-biased CLIP prediction.

SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images

likyoo/SegEarth-OV 2 Oct 2024

To tackle this issue, we propose a simple and general upsampler, SimFeatUp, to restore lost spatial information in deep features in a training-free style.

Open-Vocabulary Universal Image Segmentation with MaskCLIP

mlpc-ucsd/maskclip 18 Aug 2022

In this paper, we tackle an emerging computer vision task, open-vocabulary universal image segmentation, that aims to perform semantic/instance/panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions in inference time.