SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning

7 Jul 2024  ยท  Yi Feng, Zizhan Guo, Qijun Chen, Rui Fan ยท

Unsupervised monocular depth estimation frameworks have shown promising performance in autonomous driving. However, existing solutions primarily rely on a simple convolutional neural network for ego-motion recovery, which struggles to estimate precise camera poses in dynamic, complicated real-world scenarios. These inaccurately estimated camera poses can inevitably deteriorate the photometric reconstruction and mislead the depth estimation networks with wrong supervisory signals. In this article, we introduce SCIPaD, a novel approach that incorporates spatial clues for unsupervised depth-pose joint learning. Specifically, a confidence-aware feature flow estimator is proposed to acquire 2D feature positional translations and their associated confidence levels. Meanwhile, we introduce a positional clue aggregator, which integrates pseudo 3D point clouds from DepthNet and 2D feature flows into homogeneous positional representations. Finally, a hierarchical positional embedding injector is proposed to selectively inject spatial clues into semantic features for robust camera pose decoding. Extensive experiments and analyses demonstrate the superior performance of our model compared to other state-of-the-art methods. Remarkably, SCIPaD achieves a reduction of 22.2\% in average translation error and 34.8\% in average angular error for camera pose estimation task on the KITTI Odometry dataset. Our source code is available at \url{https://mias.group/SCIPaD}.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Monocular Depth Estimation KITTI Eigen split unsupervised SCIPaD absolute relative error 0.090 # 4
RMSE 4.056 # 4
Sq Rel 0.650 # 6
RMSE log 0.166 # 2
Delta < 1.25 0.918 # 2
Delta < 1.25^2 0.970 # 2
Delta < 1.25^3 0.985 # 2
Resolution 640x192 # 1
Camera Pose Estimation KITTI Odometry Benchmark SCIPaD Absolute Trajectory Error [m] 20.83 # 1
Average Translational Error et[%] 8.63 # 1
Average Rotational Error er[%] 3.17 # 1

Methods


No methods listed for this paper. Add relevant methods here