Depth Estimation

822 papers with code • 14 benchmarks • 70 datasets

Depth Estimation is the task of measuring the distance of each pixel relative to the camera. Depth is extracted from either monocular (single) or stereo (multiple views of a scene) images. Traditional methods use multi-view geometry to find the relationship between the images. Newer methods can directly estimate depth by minimizing the regression loss, or by learning to generate a novel view from a sequence. The most popular benchmarks are KITTI and NYUv2. Models are typically evaluated according to a RMS metric.

Source: DIODE: A Dense Indoor and Outdoor DEpth Dataset


Use these libraries to find Depth Estimation models and implementations

Most implemented papers

High Quality Monocular Depth Estimation via Transfer Learning

ialhashim/DenseDepth 31 Dec 2018

Accurate depth estimation from images is a fundamental task in many applications including scene understanding and reconstruction.

Deeper Depth Prediction with Fully Convolutional Residual Networks

iro-cp/FCRN-DepthPrediction 1 Jun 2016

This paper addresses the problem of estimating the depth map of a scene given a single RGB image.

Unsupervised Monocular Depth Estimation with Left-Right Consistency

mrharicot/monodepth CVPR 2017

Learning based methods have shown very promising results for the task of depth estimation in single images.

Vision Transformers for Dense Prediction

isl-org/DPT ICCV 2021

We introduce dense vision transformers, an architecture that leverages vision transformers in place of convolutional networks as a backbone for dense prediction tasks.

Digging Into Self-Supervised Monocular Depth Estimation

nianticlabs/monodepth2 4 Jun 2018

Per-pixel ground-truth depth data is challenging to acquire at scale.

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer

intel-isl/MiDaS 2 Jul 2019

In particular, we propose a robust training objective that is invariant to changes in depth range and scale, advocate the use of principled multi-objective learning to combine data from different sources, and highlight the importance of pretraining encoders on auxiliary tasks.

From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation

cogaplex-bts/bts 24 Jul 2019

We show that the proposed method outperforms the state-of-the-art works with significant margin evaluating on challenging benchmarks.

DINOv2: Learning Robust Visual Features without Supervision

facebookresearch/dinov2 14 Apr 2023

The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision.

Efficient Attention: Attention with Linear Complexities

cmsflash/efficient-attention 4 Dec 2018

Dot-product attention has wide applications in computer vision and natural language processing.