Learning to Recover 3D Scene Shape from a Single Image

Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length. We investigate this problem in detail, and propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape. In addition, we propose an image-level normalized regression loss and a normal-based geometry loss to enhance depth prediction models trained on mixed datasets. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot dataset generalization. Code is available at: https://git.io/Depth

PDF Abstract CVPR 2021 PDF CVPR 2021 Abstract

Results from the Paper


 Ranked #1 on Indoor Monocular Depth Estimation on DIODE (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Depth Estimation DIODE LeRes Delta < 1.25 0.234 # 2
Indoor Monocular Depth Estimation DIODE LeReS Delta < 1.25^3 0.900 # 1
Monocular Depth Estimation KITTI Eigen split LeReS absolute relative error 0.149 # 68
Delta < 1.25 0.784 # 40
Monocular Depth Estimation NYU-Depth V2 LeReS absolute relative error 0.09 # 26
Delta < 1.25 0.916 # 33
Depth Estimation ScanNetV2 LeReS absolute relative error 0.095 # 2

Methods


No methods listed for this paper. Add relevant methods here