Local features that are robust to both viewpoint and appearance changes are crucial for many computer vision tasks.
Even when this is a known scene, the answer typically requires an expensive search across scale space, with matching and geometric verification of large sets of local features.
We propose that it is unnecessary to have such a high reliance on ground truth depths or even corresponding stereo pairs.
We introduce a model to predict the geometry of both visible and occluded traversable surfaces, given a single RGB image as input.
Monocular depth estimators can be trained with various forms of self-supervision from binocular-stereo data to circumvent the need for high-quality laser scans or other ground-truth data.
Ranked #11 on Monocular Depth Estimation on KITTI Eigen split
Semi-supervised learning (SSL) partially circumvents the high cost of labeling data by augmenting a small labeled dataset with a large and relatively cheap unlabeled dataset drawn from the same distribution.
We propose a simple method to construct a deep feature space, with explicitly disentangled representations of several known transformations.
We propose Swipe Mosaics, an interactive visualization that places the individual video frames on a 2D planar map that represents the layout of the physical scene.
Learning based methods have shown very promising results for the task of depth estimation in single images.
Ranked #3 on Monocular Depth Estimation on Mid-Air Dataset
Under some specific circumstances, Expected Error Reduction has been one of the strongest-performing informativeness criteria for active learning.
However, image-importance is individual-specific, i. e. a teaching image is important to a student if it changes their overall ability to discriminate between classes.