M4Depth: Monocular depth estimation for autonomous vehicles in unseen environments

20 May 2021  ·  Michaël Fonder, Damien Ernst, Marc Van Droogenbroeck ·

Estimating the distance to objects is crucial for autonomous vehicles when using depth sensors is not possible. In this case, the distance has to be estimated from on-board mounted RGB cameras, which is a complex task especially in environments such as natural outdoor landscapes. In this paper, we present a new method named M4Depth for depth estimation. First, we establish a bijective relationship between depth and the visual disparity of two consecutive frames and show how to exploit it to perform motion-invariant pixel-wise depth estimation. Then, we detail M4Depth which is based on a pyramidal convolutional neural network architecture where each level refines an input disparity map estimate by using two customized cost volumes. We use these cost volumes to leverage the visual spatio-temporal constraints imposed by motion and to make the network robust for varied scenes. We benchmarked our approach both in test and generalization modes on public datasets featuring synthetic camera trajectories recorded in a wide variety of outdoor scenes. Results show that our network outperforms the state of the art on these datasets, while also performing well on a standard depth estimation benchmark. The code of our method is publicly available at https://github.com/michael-fonder/M4Depth.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Monocular Depth Estimation Mid-Air Dataset M4Depth-d6 (VMD) Abs Rel 0.1425 # 2
SQ Rel 3.6798 # 5
RMSE 8.8641 # 1
RMSE log 0.24571 # 2


No methods listed for this paper. Add relevant methods here