Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation. Text-to-image diffusion models have the remarkable ability to generate high-quality images with diverse open-vocabulary language descriptions. This demonstrates that their internal representation space is highly correlated with open concepts in the real world. Text-image discriminative models like CLIP, on the other hand, are good at classifying images into open-vocabulary labels. We leverage the frozen internal representations of both these models to perform panoptic segmentation of any category in the wild. Our approach outperforms the previous state of the art by significant margins on both open-vocabulary panoptic and semantic segmentation tasks. In particular, with COCO training only, our method achieves 23.4 PQ and 30.0 mIoU on the ADE20K dataset, with 8.3 PQ and 7.9 mIoU absolute improvement over the previous state of the art. We open-source our code and models at https://github.com/NVlabs/ODISE .

PDF Abstract CVPR 2023 PDF CVPR 2023 Abstract

Results from the Paper


Ranked #2 on Open-World Instance Segmentation on UVO (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Open Vocabulary Panoptic Segmentation ADE20K ODISE(Caption) PQ 23.4 # 4
Open Vocabulary Panoptic Segmentation ADE20K ODISE (Label) PQ 22.6 # 5
Open Vocabulary Semantic Segmentation ADE20K-150 ODISE mIoU 29.9 # 10
Open Vocabulary Semantic Segmentation ADE20K-847 ODISE mIoU 11.1 # 10
Open Vocabulary Semantic Segmentation PASCAL Context-459 ODISE mIoU 14.5 # 8
Open Vocabulary Semantic Segmentation PASCAL Context-59 ODISE mIoU 57.3 # 9
Open Vocabulary Semantic Segmentation PascalVOC-20 ODISE mIoU 84.6 # 11
Zero Shot Segmentation Segmentation in the Wild odise Mean AP 38.7 # 6
Open-World Instance Segmentation UVO ODISE ARmask 57.7 # 2

Methods