Monocular Depth Estimation
398 papers with code • 25 benchmarks • 32 datasets
Monocular Depth Estimation is the task of estimating the depth value (distance relative to the camera) of each pixel given a single (monocular) RGB image. This challenging task is a key prerequisite for determining scene understanding for applications such as 3D scene reconstruction, autonomous driving, and AR. State-of-the-art methods usually fall into one of two categories: designing a complex network that is powerful enough to directly regress the depth map, or splitting the input into bins or windows to reduce computational complexity. The most popular benchmarks are the KITTI and NYUv2 datasets. Models are typically evaluated using RMSE or absolute relative error.
Libraries
Use these libraries to find Monocular Depth Estimation models and implementationsMost implemented papers
High Quality Monocular Depth Estimation via Transfer Learning
Accurate depth estimation from images is a fundamental task in many applications including scene understanding and reconstruction.
DINOv2: Learning Robust Visual Features without Supervision
The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision.
Deeper Depth Prediction with Fully Convolutional Residual Networks
This paper addresses the problem of estimating the depth map of a scene given a single RGB image.
Unsupervised Monocular Depth Estimation with Left-Right Consistency
Learning based methods have shown very promising results for the task of depth estimation in single images.
Digging Into Self-Supervised Monocular Depth Estimation
Per-pixel ground-truth depth data is challenging to acquire at scale.
Vision Transformers for Dense Prediction
We introduce dense vision transformers, an architecture that leverages vision transformers in place of convolutional networks as a backbone for dense prediction tasks.
Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
In particular, we propose a robust training objective that is invariant to changes in depth range and scale, advocate the use of principled multi-objective learning to combine data from different sources, and highlight the importance of pretraining encoders on auxiliary tasks.
From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation
We show that the proposed method outperforms the state-of-the-art works with significant margin evaluating on challenging benchmarks.
Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos
Models and examples built with TensorFlow
AdaBins: Depth Estimation using Adaptive Bins
We address the problem of estimating a high quality dense depth map from a single RGB input image.