ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

2 Jun 2021  ·  Danila Rukhovich, Anna Vorontsova, Anton Konushin ·

In this paper, we introduce the task of multi-view RGB-based 3D object detection as an end-to-end optimization problem. To address this problem, we propose ImVoxelNet, a novel fully convolutional method of 3D object detection based on monocular or multi-view RGB images. The number of monocular images in each multi-view input can variate during training and inference; actually, this number might be unique for each multi-view input. ImVoxelNet successfully handles both indoor and outdoor scenes, which makes it general-purpose. Specifically, it achieves state-of-the-art results in car detection on KITTI (monocular) and nuScenes (multi-view) benchmarks among all methods that accept RGB images. Moreover, it surpasses existing RGB-based 3D object detection methods on the SUN RGB-D dataset. On ScanNet, ImVoxelNet sets a new benchmark for multi-view 3D object detection. The source code and the trained models are available at https://github.com/saic-vul/imvoxelnet.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
3D Object Detection DAIR-V2X-I ImVoxelNet AP|R40(moderate) 37.6 # 9
AP|R40(easy) 44.8 # 9
AP|R40(hard) 37.6 # 9
3D Object Detection ScanNetV2 ImVoxelNet (RGB only) mAP@0.25 48.1 # 25
mAP@0.5 22.7 # 25
Monocular 3D Object Detection SUN RGB-D ImVoxelNet AP@0.15 (10 / NYU-37) 42.69 # 2
AP@0.15 (NYU-37) 21.08 # 2
AP@0.15 (10 / PNet-30) 48.74 # 1
Room Layout Estimation SUN RGB-D ImVoxelNet IoU 59.3 # 2
Camera Pitch 2.63 # 1
Camera Roll 1.96 # 1

Methods


No methods listed for this paper. Add relevant methods here