Monocular Depth Estimation
337 papers with code • 18 benchmarks • 26 datasets
Monocular Depth Estimation is the task of estimating the depth value (distance relative to the camera) of each pixel given a single (monocular) RGB image. This challenging task is a key prerequisite for determining scene understanding for applications such as 3D scene reconstruction, autonomous driving, and AR. State-of-the-art methods usually fall into one of two categories: designing a complex network that is powerful enough to directly regress the depth map, or splitting the input into bins or windows to reduce computational complexity. The most popular benchmarks are the KITTI and NYUv2 datasets. Models are typically evaluated using RMSE or absolute relative error.
Libraries
Use these libraries to find Monocular Depth Estimation models and implementationsDatasets
Most implemented papers
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras
We present a novel method for simultaneous learning of depth, egomotion, object motion, and camera intrinsics from monocular videos, using only consistency across neighboring video frames as supervision signal.
3D Packing for Self-Supervised Monocular Depth Estimation
Although cameras are ubiquitous, robotic platforms typically rely on active sensors like LiDAR for direct 3D perception.
Towards Better Generalization: Joint Depth-Pose Learning without PoseNet
In this work, we tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning.
S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation
S2R-DepthNet consists of: a) a Structure Extraction (STE) module which extracts a domaininvariant structural representation from an image by disentangling the image into domain-invariant structure and domain-specific style components, b) a Depth-specific Attention (DSA) module, which learns task-specific knowledge to suppress depth-irrelevant structures for better depth estimation and generalization, and c) a depth prediction module (DP) to predict depth from the depth-specific representation.
Enforcing geometric constraints of virtual normal for depth prediction
Monocular depth prediction plays a crucial role in understanding 3D scene geometry.
Digging Into Self-Supervised Monocular Depth Estimation
Per-pixel ground-truth depth data is challenging to acquire at scale.
Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction
In this work, we show the importance of the high-order 3D geometric constraints for depth prediction.
Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth
Depth estimation from a single image is an important task that can be applied to various fields in computer vision, and has grown rapidly with the development of convolutional neural networks.
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth
Finally, ZoeD-M12-NK is the first model that can jointly train on multiple datasets (NYU Depth v2 and KITTI) without a significant drop in performance and achieve unprecedented zero-shot generalization performance to eight unseen datasets from both indoor and outdoor domains.
Neural Video Depth Stabilizer
Video depth estimation aims to infer temporally consistent depth.