This paper addresses the problem of estimating the depth map of a scene given a single RGB image.
Despite the progress on monocular depth estimation in recent years, we show that the gap between monocular and stereo depth accuracy remains large$-$a particularly relevant result due to the prevalent reliance upon monocular cameras by vehicles that are expected to be self-driving.
We propose a weakly-supervised transfer learning method that uses mixed 2D and 3D labels in a unified deep neutral network that presents two-stage cascaded structure.
#4 best model for 3D Human Pose Estimation on Human3.6M
Deployment of deep learning models in robotics as sensory information extractors can be a daunting task to handle, even using generic GPU cards.
We consider the problem of dense depth prediction from a sparse set of depth measurements and a single RGB image.
Many standard robotic platforms are equipped with at least a fixed 2D laser range finder and a monocular camera.