TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	NYU Depth v2	InvPT	Mean IoU	53.56%	# 22
Surface Normal Estimation	NYU-Depth V2	InvPT	Mean Angle Error	19.04	# 2
Boundary Detection	NYU-Depth V2	InvPT	odsF	78.1	# 1
Monocular Depth Estimation	NYU-Depth V2	InvPT	RMSE	0.5183	# 57
Saliency Detection	PASCAL Context	InvPT	max_F1	84.81	# 1
Human Parsing	PASCAL Context	InvPT	mIoU	67.61	# 1
Boundary Detection	PASCAL Context	InvPT	odsF	73	# 1
Surface Normals Estimation	PASCAL Context	InvPT	Mean Angle Error	14.15	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/inverted-pyramid-multi-task-transformer-for/boundary-detection-on-nyu-depth-v2)](https://paperswithcode.com/sota/boundary-detection-on-nyu-depth-v2?p=inverted-pyramid-multi-task-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/inverted-pyramid-multi-task-transformer-for/saliency-detection-on-pascal-context)](https://paperswithcode.com/sota/saliency-detection-on-pascal-context?p=inverted-pyramid-multi-task-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/inverted-pyramid-multi-task-transformer-for/human-parsing-on-pascal-context)](https://paperswithcode.com/sota/human-parsing-on-pascal-context?p=inverted-pyramid-multi-task-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/inverted-pyramid-multi-task-transformer-for/boundary-detection-on-pascal-context)](https://paperswithcode.com/sota/boundary-detection-on-pascal-context?p=inverted-pyramid-multi-task-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/inverted-pyramid-multi-task-transformer-for/surface-normals-estimation-on-pascal-context)](https://paperswithcode.com/sota/surface-normals-estimation-on-pascal-context?p=inverted-pyramid-multi-task-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/inverted-pyramid-multi-task-transformer-for/surface-normal-estimation-on-nyu-depth-v2)](https://paperswithcode.com/sota/surface-normal-estimation-on-nyu-depth-v2?p=inverted-pyramid-multi-task-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/inverted-pyramid-multi-task-transformer-for/semantic-segmentation-on-nyu-depth-v2)](https://paperswithcode.com/sota/semantic-segmentation-on-nyu-depth-v2?p=inverted-pyramid-multi-task-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/inverted-pyramid-multi-task-transformer-for/monocular-depth-estimation-on-nyu-depth-v2)](https://paperswithcode.com/sota/monocular-depth-estimation-on-nyu-depth-v2?p=inverted-pyramid-multi-task-transformer-for)`

InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene Understanding

15 Mar 2022 · Hanrong Ye, Dan Xu ·

Multi-task dense scene understanding is a thriving research domain that requires simultaneous perception and reasoning on a series of correlated tasks with pixel-wise prediction. Most existing works encounter a severe limitation of modeling in the locality due to heavy utilization of convolution operations, while learning interactions and inference in a global spatial-position and multi-task context is critical for this problem. In this paper, we propose a novel end-to-end Inverted Pyramid multi-task Transformer (InvPT) to perform simultaneous modeling of spatial positions and multiple tasks in a unified framework. To the best of our knowledge, this is the first work that explores designing a transformer structure for multi-task dense prediction for scene understanding. Besides, it is widely demonstrated that a higher spatial resolution is remarkably beneficial for dense predictions, while it is very challenging for existing transformers to go deeper with higher resolutions due to huge complexity to large spatial size. InvPT presents an efficient UP-Transformer block to learn multi-task feature interaction at gradually increased resolutions, which also incorporates effective self-attention message passing and multi-scale feature aggregation to produce task-specific prediction at a high resolution. Our method achieves superior multi-task performance on NYUD-v2 and PASCAL-Context datasets respectively, and significantly outperforms previous state-of-the-arts. The code is available at https://github.com/prismformore/InvPT

PDF Abstract

Code

Add Remove Mark official

prismformore/InvPT official

269

Tasks

Add Remove

Boundary Detection

Human Parsing

Monocular Depth Estimation

Saliency Detection

Scene Understanding

Semantic Segmentation

Surface Normal Estimation

Surface Normals Estimation

Datasets

NYUv2

PASCAL Context

Results from the Paper

Edit

Ranked #1 on Boundary Detection on NYU-Depth V2

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	NYU Depth v2	InvPT	Mean IoU	53.56%	# 22	Compare
Surface Normal Estimation	NYU-Depth V2	InvPT	Mean Angle Error	19.04	# 2	Compare
Boundary Detection	NYU-Depth V2	InvPT	odsF	78.1	# 1	Compare
Monocular Depth Estimation	NYU-Depth V2	InvPT	RMSE	0.5183	# 57	Compare
Saliency Detection	PASCAL Context	InvPT	max_F1	84.81	# 1	Compare
Human Parsing	PASCAL Context	InvPT	mIoU	67.61	# 1	Compare
Boundary Detection	PASCAL Context	InvPT	odsF	73	# 1	Compare
Surface Normals Estimation	PASCAL Context	InvPT	Mean Angle Error	14.15	# 1	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Convolution • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene Understanding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove