We introduce DiscoBox, a novel framework that jointly learns instance segmentation and semantic correspondence using bounding box supervision.
We present a novel architecture for 3D object detection, M3DeTR, which combines different point cloud representations (raw, voxels, bird-eye view) with different feature scales based on multi-scale feature pyramids.
Results show that our framework achieves the state-of-the-art performance with 31 FPS and improves our baseline significantly by 9. 0% mAP on the nuScenes test set.
Object detection is an essential step towards holistic scene understanding.
Ranked #157 on Object Detection on COCO test-dev
This encourages the network to preserve the geometric structure in Euclidean space throughout the feature extraction hierarchy.