Digging Into Output Representation for Monocular 3D Object Detection

29 Sep 2021  ·  Liang Peng, Senbo Yan, Chenxi Huang, Xiaofei He, Deng Cai ·

Monocular 3D object detection aims to recognize and localize objects in 3D space from a single image. Recent researches have conducted remarkable advancements, while all of them follow a typical output representation in LiDAR-based 3D detection. However, in this paper, we argue that the existing discrete output representation is not suitable for monocular 3D detection. Specifically, monocular 3D detection has only two-dimensional information input while is required to output three-dimensional detections. This characteristic indicates that monocular 3D detection is inherently different from other typical detection tasks that have the same dimensional input and output. The dimension gap causes a large lower bound for the error of estimated depth. Therefore, we propose to reformulate the existing discrete output representation as a spatial probability distribution according to depth. This probability distribution considers the uncertainty caused by the absent depth dimension, allowing us to accurately and comprehensively represent objects in 3D space. Extensive experiments exhibit the superiority of our output representation. As a result, we have applied our method to 12 SOTA monocular 3D detectors, consistently boosting their average precision (AP) by ~ 20% relative improvements. The source code will be publicly available soon.

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here