TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Monocular Depth Estimation	KITTI Eigen split	Depthformer	absolute relative error	0.058	# 29
Monocular Depth Estimation	KITTI Eigen split	Depthformer	RMSE	2.285	# 27
Monocular Depth Estimation	KITTI Eigen split	Depthformer	Sq Rel	0.187	# 3
Monocular Depth Estimation	KITTI Eigen split	Depthformer	RMSE log	0.087	# 26
Monocular Depth Estimation	KITTI Eigen split	Depthformer	Delta < 1.25	0.967	# 26
Monocular Depth Estimation	KITTI Eigen split	Depthformer	Delta < 1.25^2	0.996	# 25
Monocular Depth Estimation	KITTI Eigen split	Depthformer	Delta < 1.25^3	0.999	# 11
Monocular Depth Estimation	NYU-Depth V2	Depthformer	RMSE	0.345	# 33
Monocular Depth Estimation	NYU-Depth V2	Depthformer	absolute relative error	0.100	# 38
Monocular Depth Estimation	NYU-Depth V2	Depthformer	Delta < 1.25	0.913	# 35
Monocular Depth Estimation	NYU-Depth V2	Depthformer	Delta < 1.25^2	0.988	# 28
Monocular Depth Estimation	NYU-Depth V2	Depthformer	Delta < 1.25^3	0.997	# 27
Monocular Depth Estimation	NYU-Depth V2	Depthformer	log 10	0.042	# 33

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/depthformer-multiscale-vision-transformer-for/monocular-depth-estimation-on-kitti-eigen)](https://paperswithcode.com/sota/monocular-depth-estimation-on-kitti-eigen?p=depthformer-multiscale-vision-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/depthformer-multiscale-vision-transformer-for/monocular-depth-estimation-on-nyu-depth-v2)](https://paperswithcode.com/sota/monocular-depth-estimation-on-nyu-depth-v2?p=depthformer-multiscale-vision-transformer-for)`

Depthformer : Multiscale Vision Transformer For Monocular Depth Estimation With Local Global Information Fusion

10 Jul 2022 · Ashutosh Agarwal, Chetan Arora ·

Attention-based models such as transformers have shown outstanding performance on dense prediction tasks, such as semantic segmentation, owing to their capability of capturing long-range dependency in an image. However, the benefit of transformers for monocular depth prediction has seldom been explored so far. This paper benchmarks various transformer-based models for the depth estimation task on an indoor NYUV2 dataset and an outdoor KITTI dataset. We propose a novel attention-based architecture, Depthformer for monocular depth estimation that uses multi-head self-attention to produce the multiscale feature maps, which are effectively combined by our proposed decoder network. We also propose a Transbins module that divides the depth range into bins whose center value is estimated adaptively per image. The final depth estimated is a linear combination of bin centers for each pixel. Transbins module takes advantage of the global receptive field using the transformer module in the encoding stage. Experimental results on NYUV2 and KITTI depth estimation benchmark demonstrate that our proposed method improves the state-of-the-art by 3.3%, and 3.3% respectively in terms of Root Mean Squared Error (RMSE). Code is available at https://github.com/ashutosh1807/Depthformer.git.

PDF Abstract

Code

Add Remove Mark official

ashutosh1807/depthformer official

Tasks

Add Remove

Depth Estimation

Depth Prediction

Monocular Depth Estimation

Semantic Segmentation

Datasets

KITTI

NYUv2

Results from the Paper

Edit

Ranked #29 on Monocular Depth Estimation on KITTI Eigen split (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Monocular Depth Estimation	KITTI Eigen split	Depthformer	absolute relative error	0.058	# 29	Compare
			RMSE	2.285	# 27	Compare
			Sq Rel	0.187	# 3	Compare
			RMSE log	0.087	# 26	Compare
			Delta < 1.25	0.967	# 26	Compare
			Delta < 1.25^2	0.996	# 25	Compare
			Delta < 1.25^3	0.999	# 11	Compare
Monocular Depth Estimation	NYU-Depth V2	Depthformer	RMSE	0.345	# 33	Compare
			absolute relative error	0.100	# 38	Compare
			Delta < 1.25	0.913	# 35	Compare
			Delta < 1.25^2	0.988	# 28	Compare
			Delta < 1.25^3	0.997	# 27	Compare
			log 10	0.042	# 33	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Depthformer : Multiscale Vision Transformer For Monocular Depth Estimation With Local Global Information Fusion

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove