Open Vocabulary Panoptic Segmentation

7 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Open Vocabulary Panoptic Segmentation

Trend	Dataset	Best Model	Paper	Code	Compare
	ADE20K	PosSAM			See all

Datasets

ADE20K

Most implemented papers

Most implemented Social Latest No code

Panoptic Vision-Language Feature Fields

ethz-asl/autolabel • • 11 Sep 2023

In this paper, we propose to the best of our knowledge the first algorithm for open-vocabulary panoptic segmentation in 3D scenes.

Paper
Code

Extract Free Dense Labels from CLIP

chongzhou96/maskclip • • 2 Dec 2021

Contrastive Language-Image Pre-training (CLIP) has made a remarkable breakthrough in open-vocabulary zero-shot image recognition.

Paper
Code

Open-Vocabulary Universal Image Segmentation with MaskCLIP

mlpc-ucsd/maskclip • • 18 Aug 2022

In this paper, we tackle an emerging computer vision task, open-vocabulary universal image segmentation, that aims to perform semantic/instance/panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions in inference time.

Paper
Code

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

nvlabs/odise • • CVPR 2023

Our approach outperforms the previous state of the art by significant margins on both open-vocabulary panoptic and semantic segmentation tasks.

Paper
Code

Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP

bytedance/fc-clip • • NeurIPS 2023

The proposed FC-CLIP, benefits from the following observations: the frozen CLIP backbone maintains the ability of open-vocabulary classification and can also serve as a strong mask generator, and the convolutional CLIP generalizes well to a larger input resolution than the one used during contrastive image-text pretraining.

Paper
Code

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

wusize/clipself • • 2 Oct 2023

However, when transferring the vision-language alignment of CLIP from global image representation to local region representation for the open-vocabulary dense prediction tasks, CLIP ViTs suffer from the domain shift from full images to local image regions.

Paper
Code

PosSAM: Panoptic Open-vocabulary Segment Anything

Vibashan/PosSAM • • 14 Mar 2024

In this paper, we introduce an open-vocabulary panoptic segmentation model that effectively unifies the strengths of the Segment Anything Model (SAM) with the vision-language CLIP model in an end-to-end framework.

Paper
Code

Open Vocabulary Panoptic Segmentation

Benchmarks Add a Result

Datasets

Most implemented papers

Panoptic Vision-Language Feature Fields

Extract Free Dense Labels from CLIP

Open-Vocabulary Universal Image Segmentation with MaskCLIP

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

PosSAM: Panoptic Open-vocabulary Segment Anything

Content

Benchmarks

Add a Result