Disentangling Object Motion and Occlusion for Unsupervised Multi-frame Monocular Depth

29 Mar 2022  ·  Ziyue Feng, Liang Yang, Longlong Jing, HaiYan Wang, YingLi Tian, Bing Li ·

Conventional self-supervised monocular depth prediction methods are based on a static environment assumption, which leads to accuracy degradation in dynamic scenes due to the mismatch and occlusion problems introduced by object motions. Existing dynamic-object-focused methods only partially solved the mismatch problem at the training loss level. In this paper, we accordingly propose a novel multi-frame monocular depth prediction method to solve these problems at both the prediction and supervision loss levels. Our method, called DynamicDepth, is a new framework trained via a self-supervised cycle consistent learning scheme. A Dynamic Object Motion Disentanglement (DOMD) module is proposed to disentangle object motions to solve the mismatch problem. Moreover, novel occlusion-aware Cost Volume and Re-projection Loss are designed to alleviate the occlusion effects of object motions. Extensive analyses and experiments on the Cityscapes and KITTI datasets show that our method significantly outperforms the state-of-the-art monocular depth prediction methods, especially in the areas of dynamic objects. Code is available at https://github.com/AutoAILab/DynamicDepth

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Unsupervised Monocular Depth Estimation Cityscapes DynamicDepth RMSE 5.867 # 3
RMSE log 0.157 # 4
Square relative error (SqRel) 1.000 # 3
Absolute relative error (AbsRel) 0.103 # 4
Test frames 2 # 7
Unsupervised Monocular Depth Estimation KITTI Eigen Split Improved Ground Truth DynamicDepth absolute relative error 0.068 # 1
Monocular Depth Estimation KITTI Eigen split unsupervised DynamicDepth (M+640x192) absolute relative error 0.096 # 9
RMSE 4.458 # 16
Sq Rel 0.720 # 16
RMSE log 0.175 # 10
Delta < 1.25 0.897 # 11
Delta < 1.25^2 0.964 # 12
Delta < 1.25^3 0.984 # 5
Mono X # 1