From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation

24 Jul 2019  ·  Jin Han Lee, Myung-Kyu Han, Dong Wook Ko, Il Hong Suh ·

Estimating accurate depth from a single image is challenging because it is an ill-posed problem as infinitely many 3D scenes can be projected to the same 2D scene. However, recent works based on deep convolutional neural networks show great progress with plausible results. The convolutional neural networks are generally composed of two parts: an encoder for dense feature extraction and a decoder for predicting the desired depth. In the encoder-decoder schemes, repeated strided convolution and spatial pooling layers lower the spatial resolution of transitional outputs, and several techniques such as skip connections or multi-layer deconvolutional networks are adopted to recover the original resolution for effective dense prediction. In this paper, for more effective guidance of densely encoded features to the desired depth prediction, we propose a network architecture that utilizes novel local planar guidance layers located at multiple stages in the decoding phase. We show that the proposed method outperforms the state-of-the-art works with significant margin evaluating on challenging benchmarks. We also provide results from an ablation study to validate the effectiveness of the proposed method.

PDF Abstract


Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Monocular Depth Estimation KITTI Eigen split BTS absolute relative error 0.064 # 21
Monocular Depth Estimation NYU-Depth V2 BTS RMSE 0.392 # 32
absolute relative error 0.110 # 31
Delta < 1.25 0.885 # 32
Delta < 1.25^2 0.978 # 31
Delta < 1.25^3 0.995 # 27
log 10 0.047 # 31

Results from Other Papers

Task Dataset Model Metric Name Metric Value Rank Source Paper Compare
Depth Estimation NYU-Depth V2 BTS RMS 0.407 # 8