OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

1 Sep 2023  ·  Zhening Huang, Xiaoyang Wu, Xi Chen, Hengshuang Zhao, Lei Zhu, Joan Lasenby ·

Current 3D open-vocabulary scene understanding methods mostly utilize well-aligned 2D images as the bridge to learn 3D features with language. However, applying these approaches becomes challenging in scenarios where 2D images are absent. In this work, we introduce a new pipeline, namely, OpenIns3D, which requires no 2D image inputs, for 3D open-vocabulary scene understanding at the instance level. The OpenIns3D framework employs a "Mask-Snap-Lookup" scheme. The "Mask" module learns class-agnostic mask proposals in 3D point clouds. The "Snap" module generates synthetic scene-level images at multiple scales and leverages 2D vision language models to extract interesting objects. The "Lookup" module searches through the outcomes of "Snap" with the help of Mask2Pixel maps, which contain the precise correspondence between 3D masks and synthetic images, to assign category names to the proposed masks. This 2D input-free and flexible approach achieves state-of-the-art results on a wide range of indoor and outdoor datasets by a large margin. Moreover, OpenIns3D allows for effortless switching of 2D detectors without re-training. When integrated with powerful 2D open-world models such as ODISE and GroundingDINO, excellent results were observed on open-vocabulary instance segmentation. When integrated with LLM-powered 2D models like LISA, it demonstrates a remarkable capacity to process highly complex text queries which require intricate reasoning and world knowledge. Project page: https://zheninghuang.github.io/OpenIns3D/

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
3D Open-Vocabulary Instance Segmentation Replica OpenIns3D mAP 13.6 # 2
3D Open-Vocabulary Instance Segmentation S3DIS OpenIns3D AP50 Novel B8/N4 37.0 # 1
AP50 Novel B6/N6 33.0 # 1
3D Open-Vocabulary Object Detection ScanNet on unseen classes OpenIns3D AP25 43.7 # 1
Zero-shot 3D Point Cloud Classification ScanNetV2 OpenIns3D Top 1 Accuracy % 60.8 # 1
3D Open-Vocabulary Instance Segmentation STPLS3D OPENINS3D AP50 13.3 # 1

Methods


No methods listed for this paper. Add relevant methods here