Open Vocabulary Panoptic Segmentation

11 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Datasets


Most implemented papers

Panoptic Vision-Language Feature Fields

ethz-asl/autolabel 11 Sep 2023

In this paper, we propose to the best of our knowledge the first algorithm for open-vocabulary panoptic segmentation in 3D scenes.

Extract Free Dense Labels from CLIP

chongzhou96/maskclip 2 Dec 2021

Contrastive Language-Image Pre-training (CLIP) has made a remarkable breakthrough in open-vocabulary zero-shot image recognition.

Open-Vocabulary Universal Image Segmentation with MaskCLIP

mlpc-ucsd/maskclip 18 Aug 2022

In this paper, we tackle an emerging computer vision task, open-vocabulary universal image segmentation, that aims to perform semantic/instance/panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions in inference time.

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

nvlabs/odise CVPR 2023

Our approach outperforms the previous state of the art by significant margins on both open-vocabulary panoptic and semantic segmentation tasks.

Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP

bytedance/fc-clip NeurIPS 2023

The proposed FC-CLIP, benefits from the following observations: the frozen CLIP backbone maintains the ability of open-vocabulary classification and can also serve as a strong mask generator, and the convolutional CLIP generalizes well to a larger input resolution than the one used during contrastive image-text pretraining.

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

wusize/clipself 2 Oct 2023

However, when transferring the vision-language alignment of CLIP from global image representation to local region representation for the open-vocabulary dense prediction tasks, CLIP ViTs suffer from the domain shift from full images to local image regions.

UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding

lygsbw/umg-clip 12 Jan 2024

Vision-language foundation models, represented by Contrastive Language-Image Pre-training (CLIP), have gained increasing attention for jointly understanding both vision and textual tasks.

PosSAM: Panoptic Open-vocabulary Segment Anything

Vibashan/PosSAM 14 Mar 2024

In this paper, we introduce an open-vocabulary panoptic segmentation model that effectively unifies the strengths of the Segment Anything Model (SAM) with the vision-language CLIP model in an end-to-end framework.

Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation

jiaosiyu1999/MAFT-Plus 1 Aug 2024

In this way, the vision and text representation of CLIP are optimized collaboratively, enhancing the alignment of the vision-text feature space.

EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation

nhw649/eov-seg 11 Dec 2024

To the best of our knowledge, EOV-Seg is the first open-vocabulary panoptic segmentation framework towards efficiency, which runs faster and achieves competitive performance compared with state-of-the-art methods.