TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Monocular Depth Estimation	NYU-Depth V2	MeSa	RMSE	0.238	# 10
Monocular Depth Estimation	NYU-Depth V2	MeSa	absolute relative error	0.066	# 10
Monocular Depth Estimation	NYU-Depth V2	MeSa	Delta < 1.25	0.964	# 10
Monocular Depth Estimation	NYU-Depth V2	MeSa	Delta < 1.25^2	0.995	# 11
Monocular Depth Estimation	NYU-Depth V2	MeSa	Delta < 1.25^3	0.999	# 4
Monocular Depth Estimation	NYU-Depth V2	MeSa	log 10	0.029	# 10

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mesa-masked-geometric-and-supervised-pre/monocular-depth-estimation-on-nyu-depth-v2)](https://paperswithcode.com/sota/monocular-depth-estimation-on-nyu-depth-v2?p=mesa-masked-geometric-and-supervised-pre)`

MeSa: Masked, Geometric, and Supervised Pre-training for Monocular Depth Estimation

6 Oct 2023 · Muhammad Osama Khan, Junbang Liang, Chun-Kai Wang, Shan Yang, Yu Lou ·

Pre-training has been an important ingredient in developing strong monocular depth estimation models in recent years. For instance, self-supervised learning (SSL) is particularly effective by alleviating the need for large datasets with dense ground-truth depth maps. However, despite these improvements, our study reveals that the later layers of the SOTA SSL method are actually suboptimal. By examining the layer-wise representations, we demonstrate significant changes in these later layers during fine-tuning, indicating the ineffectiveness of their pre-trained features for depth estimation. To address these limitations, we propose MeSa, a comprehensive framework that leverages the complementary strengths of masked, geometric, and supervised pre-training. Hence, MeSa benefits from not only general-purpose representations learnt via masked pre training but also specialized depth-specific features acquired via geometric and supervised pre-training. Our CKA layer-wise analysis confirms that our pre-training strategy indeed produces improved representations for the later layers, overcoming the drawbacks of the SOTA SSL method. Furthermore, via experiments on the NYUv2 and IBims-1 datasets, we demonstrate that these enhanced representations translate to performance improvements in both the in-distribution and out-of-distribution settings. We also investigate the influence of the pre-training dataset and demonstrate the efficacy of pre-training on LSUN, which yields significantly better pre-trained representations. Overall, our approach surpasses the masked pre-training SSL method by a substantial margin of 17.1% on the RMSE. Moreover, even without utilizing any recently proposed techniques, MeSa also outperforms the most recent methods and establishes a new state-of-the-art for monocular depth estimation on the challenging NYUv2 dataset.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Depth Estimation

Monocular Depth Estimation

Self-Supervised Learning

Datasets

NYUv2

LSUN

Results from the Paper

Edit

Ranked #10 on Monocular Depth Estimation on NYU-Depth V2

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Monocular Depth Estimation	NYU-Depth V2	MeSa	RMSE	0.238	# 10	Compare
			absolute relative error	0.066	# 10	Compare
			Delta < 1.25	0.964	# 10	Compare
			Delta < 1.25^2	0.995	# 11	Compare
			Delta < 1.25^3	0.999	# 4	Compare
			log 10	0.029	# 10	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

MeSa: Masked, Geometric, and Supervised Pre-training for Monocular Depth Estimation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove