Towards Comprehensive Representation Enhancement in Semantics-guided Self-supervised Monocular Depth Estimation

Semantics-guided self-supervised monocular depth estimation has been widely researched, owing to the strong cross-task correlation of depth and semantics. However, since depth estimation and semantic segmentation are fundamentally two types of tasks: one is regression while the other is classification, the distribution of depth feature and semantic feature are naturally different. Previous works that leverage semantic information in depth estimation mostly neglect such representational discrimination, which leads to insufficient representation enhancement of depth feature. In this work, we propose an attentionbased module to enhance task-specific feature by addressing their feature uniqueness within instances. Additionally, we propose a metric learning based approach to accomplish comprehensive enhancement on depth feature by creating a separation between instances in feature space. Extensive experiments and analysis demonstrate the effectiveness of our proposed method. In the end, our method achieves the state-of-the-art performance on KITTI dataset.

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Monocular Depth Estimation KITTI Eigen split unsupervised CREMono(M + 1024x320 + Res50) absolute relative error 0.099 # 11
RMSE 4.165 # 5
Sq Rel 0.624 # 5
RMSE log 0.171 # 5
Delta < 1.25 0.902 # 9
Delta < 1.25^2 0.969 # 3
Delta < 1.25^3 0.986 # 1

Methods


No methods listed for this paper. Add relevant methods here