Monocular Depth Estimation

337 papers with code • 18 benchmarks • 26 datasets

Monocular Depth Estimation is the task of estimating the depth value (distance relative to the camera) of each pixel given a single (monocular) RGB image. This challenging task is a key prerequisite for determining scene understanding for applications such as 3D scene reconstruction, autonomous driving, and AR. State-of-the-art methods usually fall into one of two categories: designing a complex network that is powerful enough to directly regress the depth map, or splitting the input into bins or windows to reduce computational complexity. The most popular benchmarks are the KITTI and NYUv2 datasets. Models are typically evaluated using RMSE or absolute relative error.

Source: Defocus Deblurring Using Dual-Pixel Data

Libraries

Use these libraries to find Monocular Depth Estimation models and implementations

Most implemented papers

Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras

google-research/google-research ICCV 2019

We present a novel method for simultaneous learning of depth, egomotion, object motion, and camera intrinsics from monocular videos, using only consistency across neighboring video frames as supervision signal.

3D Packing for Self-Supervised Monocular Depth Estimation

TRI-ML/packnet-sfm CVPR 2020

Although cameras are ubiquitous, robotic platforms typically rely on active sensors like LiDAR for direct 3D perception.

Towards Better Generalization: Joint Depth-Pose Learning without PoseNet

B1ueber2y/TrianFlow CVPR 2020

In this work, we tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning.

S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation

microsoft/S2R-DepthNet CVPR 2021

S2R-DepthNet consists of: a) a Structure Extraction (STE) module which extracts a domaininvariant structural representation from an image by disentangling the image into domain-invariant structure and domain-specific style components, b) a Depth-specific Attention (DSA) module, which learns task-specific knowledge to suppress depth-irrelevant structures for better depth estimation and generalization, and c) a depth prediction module (DP) to predict depth from the depth-specific representation.

Enforcing geometric constraints of virtual normal for depth prediction

aim-uofa/AdelaiDepth ICCV 2019

Monocular depth prediction plays a crucial role in understanding 3D scene geometry.

Digging Into Self-Supervised Monocular Depth Estimation

nianticlabs/monodepth2 ICCV 2019

Per-pixel ground-truth depth data is challenging to acquire at scale.

Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction

aim-uofa/AdelaiDepth 7 Mar 2021

In this work, we show the importance of the high-order 3D geometric constraints for depth prediction.

Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth

vinvino02/GLPDepth 19 Jan 2022

Depth estimation from a single image is an important task that can be applied to various fields in computer vision, and has grown rapidly with the development of convolutional neural networks.

ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth

isl-org/ZoeDepth 23 Feb 2023

Finally, ZoeD-M12-NK is the first model that can jointly train on multiple datasets (NYU Depth v2 and KITTI) without a significant drop in performance and achieve unprecedented zero-shot generalization performance to eight unseen datasets from both indoor and outdoor domains.

Neural Video Depth Stabilizer

raymondwang987/nvds ICCV 2023

Video depth estimation aims to infer temporally consistent depth.