Open Vocabulary Panoptic Segmentation

7 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Datasets


Most implemented papers

Panoptic Vision-Language Feature Fields

ethz-asl/autolabel 11 Sep 2023

In this paper, we propose to the best of our knowledge the first algorithm for open-vocabulary panoptic segmentation in 3D scenes.

Extract Free Dense Labels from CLIP

chongzhou96/maskclip 2 Dec 2021

Contrastive Language-Image Pre-training (CLIP) has made a remarkable breakthrough in open-vocabulary zero-shot image recognition.

Open-Vocabulary Universal Image Segmentation with MaskCLIP

mlpc-ucsd/maskclip 18 Aug 2022

In this paper, we tackle an emerging computer vision task, open-vocabulary universal image segmentation, that aims to perform semantic/instance/panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions in inference time.

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

nvlabs/odise CVPR 2023

Our approach outperforms the previous state of the art by significant margins on both open-vocabulary panoptic and semantic segmentation tasks.

Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP

bytedance/fc-clip NeurIPS 2023

The proposed FC-CLIP, benefits from the following observations: the frozen CLIP backbone maintains the ability of open-vocabulary classification and can also serve as a strong mask generator, and the convolutional CLIP generalizes well to a larger input resolution than the one used during contrastive image-text pretraining.

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

wusize/clipself 2 Oct 2023

However, when transferring the vision-language alignment of CLIP from global image representation to local region representation for the open-vocabulary dense prediction tasks, CLIP ViTs suffer from the domain shift from full images to local image regions.

PosSAM: Panoptic Open-vocabulary Segment Anything

Vibashan/PosSAM 14 Mar 2024

In this paper, we introduce an open-vocabulary panoptic segmentation model that effectively unifies the strengths of the Segment Anything Model (SAM) with the vision-language CLIP model in an end-to-end framework.