Large-scale Monocular Depth Estimation in the Wild

Estimating the relative depth of a single image (Monocular Depth Estimation) is a significant step towards understanding the general structure of the depicted scenery, the relations of entities in the scene and their interactions. When faced with the task of estimating the depth of a scene without the use of Stereo images, we are dependent on the availability of large-scale depth datasets and high-capacity models to capture the intrinsic nature of depth. Unfortunately, creating large-scale datasets of depth images is not a trivial task. To overcome this limitation, In this work, a new approach is proposed to accumulate Depth & Surface Normal datasets from the world of different Video Games in an easy and reproducible way. This work also introduces a new loss function to better incorporate the relation between the Depth and the Surface Normal of a scene which results in higher quality depth estimations that also produce more uniform surface normals. Qualitative and quantitative comparisons are provided between the proposed method and the best approaches of the last nine years and have competitive performance to the State of the Art by having times less parameters for the proposed method. To further prove the effectiveness of this approach, an Ablation Study is also provided. Experiments on this dataset shows that using this new loss function alongside synthetic datasets increases the accuracy of ”Monocular Depth Estimation in the Wild” tasks where other approaches usually fail to generalize.

PDF

Results from the Paper


Ranked #42 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Monocular Depth Estimation NYU-Depth V2 Gaming for Depth (GfD) RMSE 0.364 # 42
absolute relative error 0.080 # 17
Delta < 1.25 0.931 # 26
Delta < 1.25^2 0.986 # 34
Delta < 1.25^3 0.996 # 37
log 10 0.033 # 16

Methods