Monocular Depth Estimation

339 papers with code • 17 benchmarks • 26 datasets

Monocular Depth Estimation is the task of estimating the depth value (distance relative to the camera) of each pixel given a single (monocular) RGB image. This challenging task is a key prerequisite for determining scene understanding for applications such as 3D scene reconstruction, autonomous driving, and AR. State-of-the-art methods usually fall into one of two categories: designing a complex network that is powerful enough to directly regress the depth map, or splitting the input into bins or windows to reduce computational complexity. The most popular benchmarks are the KITTI and NYUv2 datasets. Models are typically evaluated using RMSE or absolute relative error.

Source: Defocus Deblurring Using Dual-Pixel Data

Benchmarks

Add a Result

These leaderboards are used to track progress in Monocular Depth Estimation

Dataset	Best Model	Compare
NYU-Depth V2	Metric3Dv2(L, FT)	See all
KITTI Eigen split	Metric3Dv2 (g2, FT, 80m, flip_aug_test)	See all
KITTI Eigen split unsupervised	SQLdepth (ConvNeXt-L)	See all
NYU-Depth V2 self-supervised	IndoorDepth	See all
Mid-Air Dataset	M4Depth+U	See all
Make3D	GCNDepth	See all
IBims-1	Miangoleh et al. (SGR)	See all
DDAD	AFNet	See all
VA (Virtual Apartment)	DistDepth	See all
Middlebury 2014	Miangoleh et al. (MiDaS)	See all
KITTI	MonoViT	See all
SUN-RGBD	RPSF	See all
Cityscapes	SwinMTL	See all
UASOL	FCRN-DepthPrediction from Iro Laina et al. (2016)	See all
KITTI Object Tracking Evaluation 2012	PackNet-SfM	See all
Matterport3D	NeWCRFs	See all
Cityscapes 3D	TaskPrompter	See all

Show all 17 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Monocular Depth Estimation models and implementations

huggingface/transformers

3 papers

124,984

SeokjuLee/Insta-DM

3 papers

221

ShuweiShao/NDDepth

3 papers

Datasets

Most implemented papers

Most implemented Social Latest No code

High Quality Monocular Depth Estimation via Transfer Learning

ialhashim/DenseDepth • • 31 Dec 2018

Accurate depth estimation from images is a fundamental task in many applications including scene understanding and reconstruction.

Paper
Code

Deeper Depth Prediction with Fully Convolutional Residual Networks

iro-cp/FCRN-DepthPrediction • • 1 Jun 2016

This paper addresses the problem of estimating the depth map of a scene given a single RGB image.

Paper
Code

Unsupervised Monocular Depth Estimation with Left-Right Consistency

mrharicot/monodepth • • CVPR 2017

Learning based methods have shown very promising results for the task of depth estimation in single images.

Paper
Code

Vision Transformers for Dense Prediction

isl-org/DPT • • ICCV 2021

We introduce dense vision transformers, an architecture that leverages vision transformers in place of convolutional networks as a backbone for dense prediction tasks.

Paper
Code

Digging Into Self-Supervised Monocular Depth Estimation

nianticlabs/monodepth2 • • 4 Jun 2018

Per-pixel ground-truth depth data is challenging to acquire at scale.

Paper
Code

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer

intel-isl/MiDaS • • 2 Jul 2019

In particular, we propose a robust training objective that is invariant to changes in depth range and scale, advocate the use of principled multi-objective learning to combine data from different sources, and highlight the importance of pretraining encoders on auxiliary tasks.

Paper
Code

From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation

cogaplex-bts/bts • • 24 Jul 2019

We show that the proposed method outperforms the state-of-the-art works with significant margin evaluating on challenging benchmarks.

Paper
Code

Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos

tensorflow/models • • 15 Nov 2018

Models and examples built with TensorFlow

Paper
Code

AdaBins: Depth Estimation using Adaptive Bins

shariqfarooq123/AdaBins • • CVPR 2021

We address the problem of estimating a high quality dense depth map from a single RGB input image.

Paper
Code

DINOv2: Learning Robust Visual Features without Supervision

facebookresearch/dinov2 • • 14 Apr 2023

The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision.

Paper
Code

Monocular Depth Estimation

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result