Modality-Agnostic Learning for Radar-Lidar Fusion in Vehicle Detection

CVPR 2022 · Yu-Jhe Li, Jinhyung Park, Matthew O'Toole, Kris Kitani ·

Fusion of multiple sensor modalities such as camera, Lidar, and Radar, which are commonly found on autonomous vehicles, not only allows for accurate detection but also robustifies perception against adverse weather conditions and individual sensor failures. Due to inherent sensor characteristics, Radar performs well under extreme weather conditions (snow, rain, fog) that significantly degrade camera and Lidar. Recently, a few works have developed vehicle detection methods fusing Lidar and Radar signals, i.e., MVDNet. However, these models are typically developed under the assumption that the models always have access to two error-free sensor streams. If one of the sensors is unavailable or missing, the model may fail catastrophically. To mitigate this problem, we propose the Self-Training Multimodal Vehicle Detection Network (ST-MVDNet) which leverages a Teacher-Student mutual learning framework and a simulated sensor noise model used in strong data augmentation for Lidar and Radar. We show that by (1) enforcing output consistency between a Teacher network and a Student network and by (2) introducing missing modalities (strong augmentations) during training, our learned model breaks away from the error-free sensor assumption. This consistency enforcement enables the Student model to handle missing data properly and improve the Teacher model by updating it with the Student model's exponential moving average. Our experiments demonstrate that our proposed learning framework for multi-modal detection is able to better handle missing sensor data during inference. Furthermore, our method achieves new state-of-the-art performance (5% gain) on the Oxford Radar Robotcar dataset under various evaluation settings.

PDF Abstract