Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation

In this paper, we propose a novel system named Disp R-CNN for 3D object detection from stereo images. Many recent works solve this problem by first recovering a point cloud with disparity estimation and then apply a 3D detector. The disparity map is computed for the entire image, which is costly and fails to leverage category-specific prior. In contrast, we design an instance disparity estimation network (iDispNet) that predicts disparity only for pixels on objects of interest and learns a category-specific shape prior for more accurate disparity estimation. To address the challenge from scarcity of disparity annotation in training, we propose to use a statistical shape model to generate dense disparity pseudo-ground-truth without the need of LiDAR point clouds, which makes our system more widely applicable. Experiments on the KITTI dataset show that, even when LiDAR ground-truth is not available at training time, Disp R-CNN achieves competitive performance and outperforms previous state-of-the-art methods by 20% in terms of average precision.

PDF Abstract CVPR 2020 PDF CVPR 2020 Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Vehicle Pose Estimation KITTI Cars Hard Disp-RCNN (Stereo) Average Orientation Similarity 67.16 # 14
3D Object Detection From Stereo Images KITTI Cars Moderate Disp R-CNN AP75 45.78 # 6
3D Object Detection From Stereo Images KITTI Cyclists Moderate Disp R-CNN AP50 24.40 # 3
3D Object Detection From Stereo Images KITTI Pedestrians Moderate Disp R-CNN AP50 25.80 # 3

Methods