Query Refinement Transformer for 3D Instance Segmentation

3D instance segmentation aims to predict a set of object instances in a scene and represent them as binary foreground masks with corresponding semantic labels. However, object instances are diverse in shape and category,and point clouds are usually sparse, unordered, and irregular, which leads to a query sampling dilemma. Besides,noise background queries interfere with proper scene perception and accurate instance segmentation. To address the above issues, we propose a Query Refinement Transformer termed QueryFormer. The key to our approach is to exploit a query initialization module to optimize the initialization process for the query distribution with a high coverage and low repetition rate. Additionally, we design an affiliated transformer decoder that suppresses the interference of noise background queries and helps the foreground queries focus on instance discriminative parts to predict final segmentation results. Extensive experiments on ScanNetV2 and S3DIS datasets show that our QueryFormer can surpass state-of-the-art 3D instance segmentation methods.

PDF Abstract


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
3D Instance Segmentation ScanNet(v2) QueryFormer mAP 58.3 # 2
mAP @ 50 78.7 # 3
mAP@25 87.4 # 4