Cross-Modality Knowledge Distillation Network for Monocular 3D Object Detection

14 Nov 2022  ยท  Yu Hong, Hang Dai, Yong Ding ยท

Leveraging LiDAR-based detectors or real LiDAR point data to guide monocular 3D detection has brought significant improvement, e.g., Pseudo-LiDAR methods. However, the existing methods usually apply non-end-to-end training strategies and insufficiently leverage the LiDAR information, where the rich potential of the LiDAR data has not been well exploited. In this paper, we propose the Cross-Modality Knowledge Distillation (CMKD) network for monocular 3D detection to efficiently and directly transfer the knowledge from LiDAR modality to image modality on both features and responses. Moreover, we further extend CMKD as a semi-supervised training framework by distilling knowledge from large-scale unlabeled data and significantly boost the performance. Until submission, CMKD ranks $1^{st}$ among the monocular 3D detectors with publications on both KITTI $test$ set and Waymo $val$ set with significant performance gains compared to previous state-of-the-art methods.

PDF Abstract

Results from the Paper


 Ranked #1 on Monocular 3D Object Detection on KITTI Cyclist Hard (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Monocular 3D Object Detection KITTI Cars Easy CMKD AP Easy 28.55 # 2
Monocular 3D Object Detection KITTI Cars Hard CMKD AP Hard 16.77 # 2
Monocular 3D Object Detection KITTI Cyclist Easy CMKD AP Easy 12.52 # 1
Monocular 3D Object Detection KITTI Cyclist Hard CMKD AP Hard 6.34 # 1
Monocular 3D Object Detection KITTI Cyclist Moderate CMKD AP Medium 6.67 # 1
Monocular 3D Object Detection KITTI Pedestrian Easy CMKD AP Easy 13.94 # 1
Monocular 3D Object Detection KITTI Pedestrian Hard CMKD AP Hard 7.42 # 2
Monocular 3D Object Detection KITTI Pedestrian Moderate CMKD AP Medium 8.79 # 2

Methods