We show that using a noisy teacher, which could be a standard VO pipeline, and by designing a loss term that enforces geometric consistency of the trajectory, we can train accurate deep models for VO that do not require ground-truth labels.
During training, the network only takes as input a LiDAR point cloud, the corresponding monocular image, and the camera calibration matrix K. At train time, we do not impose direct supervision (i. e., we do not directly regress to the calibration parameters, for example).
The proposed approach significantly improves the state-of-the-art for monocular object localization on arbitrarily-shaped roads.
This paper introduces geometry and object shape and pose costs for multi-object tracking in urban driving scenarios.
Ranked #2 on 3D Multi-Object Tracking on KITTI
These category models are instance-independent and aid in the design of object landmark observations that can be incorporated into a generic monocular SLAM framework.
We then formulate a shape-aware adjustment problem that uses the learnt shape priors to recover the 3D pose and shape of a query object from an image.