Improving Depth Completion via Depth Feature Upsampling
The encoder-decoder network (ED-Net) is a commonly employed choice for existing depth completion methods but its working mechanism is ambiguous. In this paper we visualize the internal feature maps to analyze how the network densifies the input sparse depth. We find that the encoder feature of ED-Net focus on the areas with input depth points around. To obtain a dense feature and thus estimate complete depth the decoder feature tends to complement and enhance the encoder feature by skip-connection to make the fused encoder-decoder feature dense resulting in the decoder feature also exhibits sparse. However ED-Net obtains the sparse decoder feature from the dense fused feature at the previous stage where the "dense to sparse" process destroys the completeness of features and loses information. To address this issue we present a depth feature upsampling network (DFU) that explicitly utilizes these dense features to guide the upsampling of a low-resolution (LR) depth feature to a high-resolution (HR) one. The completeness of features is maintained throughout the upsampling process thus avoiding information loss. Furthermore we propose a confidence-aware guidance module (CGM) which is confidence-aware and performs guidance with adaptive receptive fields (GARF) to fully exploit the potential of these dense features as guidance. Experimental results show that our DFU a plug-and-play module can significantly improve the performance of existing ED-Net based methods with limited computational overheads and new SOTA results are achieved. Besides the generalization capability on sparser depth is also enhanced. Project page: https://npucvr.github.io/DFU.
PDF Abstract