MirageRoom: 3D Scene Segmentation with 2D Pre-trained Models by Mirage Projection
Nowadays leveraging 2D images and pre-trained models to guide 3D point cloud feature representation has shown a remarkable potential to boost the performance of 3D fundamental models. While some works rely on additional data such as 2D real-world images and their corresponding camera poses recent studies target at using point cloud exclusively by designing 3D-to-2D projection. However in the indoor scene scenario existing 3D-to-2D projection strategies suffer from severe occlusions and incoherence which fail to contain sufficient information for fine-grained point cloud segmentation task. In this paper we argue that the crux of the matter resides in the basic premise of existing projection strategies that the medium is homogeneous thereby projection rays propagate along straight lines and behind objects are occluded by front ones. Inspired by the phenomenon of mirage where the occluded objects are exposed by distorted light rays due to heterogeneous medium refraction rate we propose MirageRoom by designing parametric mirage projection with heterogeneous medium to obtain series of projected images with various distorted degrees. We further develop a masked reprojection module across 2D and 3D latent space to bridge the gap between pre-trained 2D backbone and 3D point-wise features. Both quantitative and qualitative experimental results on S3DIS and ScanNet V2 demonstrate the effectiveness of our method.
PDF Abstract