TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
3D Object Detection	nuScenes	UniTR	NDS	0.75	# 8
3D Object Detection	nuScenes	UniTR	mAP	0.71	# 18
3D Object Detection	nuScenes	UniTR	mATE	0.24	# 335
3D Object Detection	nuScenes	UniTR	mASE	0.23	# 321
3D Object Detection	nuScenes	UniTR	mAOE	0.26	# 359
3D Object Detection	nuScenes	UniTR	mAVE	0.24	# 300
3D Object Detection	nuScenes	UniTR	mAAE	0.13	# 103

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unitr-a-unified-and-efficient-multi-modal/3d-object-detection-on-nuscenes)](https://paperswithcode.com/sota/3d-object-detection-on-nuscenes?p=unitr-a-unified-and-efficient-multi-modal)`

UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation

ICCV 2023 · Haiyang Wang, Hao Tang, Shaoshuai Shi, Aoxue Li, Zhenguo Li, Bernt Schiele, LiWei Wang ·

Jointly processing information from multiple sensors is crucial to achieving accurate and robust perception for reliable autonomous driving systems. However, current 3D perception research follows a modality-specific paradigm, leading to additional computation overheads and inefficient collaboration between different sensor data. In this paper, we present an efficient multi-modal backbone for outdoor 3D perception named UniTR, which processes a variety of modalities with unified modeling and shared parameters. Unlike previous works, UniTR introduces a modality-agnostic transformer encoder to handle these view-discrepant sensor data for parallel modal-wise representation learning and automatic cross-modal interaction without additional fusion steps. More importantly, to make full use of these complementary sensor types, we present a novel multi-modal integration strategy by both considering semantic-abundant 2D perspective and geometry-aware 3D sparse neighborhood relations. UniTR is also a fundamentally task-agnostic backbone that naturally supports different 3D perception tasks. It sets a new state-of-the-art performance on the nuScenes benchmark, achieving +1.1 NDS higher for 3D object detection and +12.0 higher mIoU for BEV map segmentation with lower inference latency. Code will be available at https://github.com/Haiyang-W/UniTR .

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Code

Add Remove Mark official

haiyang-w/unitr official

240

haiyang-w/dsvt

327

Tasks

Add Remove

3D Object Detection

Autonomous Driving

object-detection

Object Detection

Representation Learning

Datasets

nuScenes

Results from the Paper

Edit

Ranked #8 on 3D Object Detection on nuScenes

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
3D Object Detection	nuScenes	UniTR	NDS	0.75	# 8	Compare
			mAP	0.71	# 18	Compare
			mATE	0.24	# 335	Compare
			mASE	0.23	# 321	Compare
			mAOE	0.26	# 359	Compare
			mAVE	0.24	# 300	Compare
			mAAE	0.13	# 103	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove