Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator

26 Feb 2025  ·  Xiankang He, Dongyan Guo, Hongji Li, Ruibo Li, Ying Cui, Chi Zhang ·

Monocular depth estimation (MDE) aims to predict scene depth from a single RGB image and plays a crucial role in 3D scene understanding. Recent advances in zero-shot MDE leverage normalized depth representations and distillation-based learning to improve generalization across diverse scenes. However, current depth normalization methods for distillation, relying on global normalization, can amplify noisy pseudo-labels, reducing distillation effectiveness. In this paper, we systematically analyze the impact of different depth normalization strategies on pseudo-label distillation. Based on our findings, we propose Cross-Context Distillation, which integrates global and local depth cues to enhance pseudo-label quality. Additionally, we introduce a multi-teacher distillation framework that leverages complementary strengths of different depth estimation models, leading to more robust and accurate depth predictions. Extensive experiments on benchmark datasets demonstrate that our approach significantly outperforms state-of-the-art methods, both quantitatively and qualitatively.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Monocular Depth Estimation ETH3D Distill Any Depth Delta < 1.25 0.981 # 1
absolute relative error 0.054 # 7
Monocular Depth Estimation NYU-Depth V2 Distill Any Depth absolute relative error 0.043 # 2
Delta < 1.25 0.981 # 7
Depth Estimation ScanNetV2 Distill Any Depth absolute relative error 0.042 # 1
Delta < 1.25 0.980 # 1

Methods


No methods listed for this paper. Add relevant methods here