TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
3D Object Detection	DAIR-V2X-I	BEVFormer	AP\|R40(moderate)	50.7	# 8
3D Object Detection	DAIR-V2X-I	BEVFormer	AP\|R40(easy)	61.4	# 8
3D Object Detection	DAIR-V2X-I	BEVFormer	AP\|R40(hard)	50.7	# 8
Bird's-Eye View Semantic Segmentation	Lyft Level 5	BEVFormer (EfficientNet-b4)	IoU vehicle - 224x480 - Long	44.5	# 2
Bird's-Eye View Semantic Segmentation	Lyft Level 5	BEVFormer (EfficientNet-b4)	IoU vehicle - 224x480 - Short	69.9	# 5
Bird's-Eye View Semantic Segmentation	Lyft Level 5	BEVFormer(ResNet-50)	IoU vehicle - 224x480 - Long	43.2	# 6
Bird's-Eye View Semantic Segmentation	Lyft Level 5	BEVFormer(ResNet-50)	IoU vehicle - 224x480 - Short	68.8	# 6
3D Object Detection	nuScenes	BEVFormer	NDS	0.57	# 213
3D Object Detection	nuScenes	BEVFormer	mAP	0.48	# 205
3D Object Detection	nuScenes	BEVFormer	mATE	0.58	# 86
3D Object Detection	nuScenes	BEVFormer	mASE	0.26	# 47
3D Object Detection	nuScenes	BEVFormer	mAOE	0.38	# 167
3D Object Detection	nuScenes	BEVFormer	mAVE	0.38	# 138
3D Object Detection	nuScenes	BEVFormer	mAAE	0.13	# 103
Bird's-Eye View Semantic Segmentation	nuScenes	BEVFormer	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	35.8	# 6
Bird's-Eye View Semantic Segmentation	nuScenes	BEVFormer	IoU veh - 448x800 - No vis filter - 100x100 at 0.5	39.0	# 4
Bird's-Eye View Semantic Segmentation	nuScenes	BEVFormer	IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	42.0	# 4
Bird's-Eye View Semantic Segmentation	nuScenes	BEVFormer	IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	45.5	# 4
Bird's-Eye View Semantic Segmentation	nuScenes	BEVFormer	IoU lane - 224x480 - 100x100 at 0.5	25.7	# 5
Robust Camera Only 3D Object Detection	nuScenes-C	BEVFormer (small)	mean Corruption Error (mCE)	102.4	# 14
Robust Camera Only 3D Object Detection	nuScenes-C	BEVFormer (small)	mean Resilience Rate (mRR)	59.07	# 14
Robust Camera Only 3D Object Detection	nuScenes-C	BEVFormer (base)	mean Corruption Error (mCE)	97.97	# 3
Robust Camera Only 3D Object Detection	nuScenes-C	BEVFormer (base)	mean Resilience Rate (mRR)	60.4	# 13
3D Object Detection	nuScenes Camera Only	BEVFormer	NDS	56.9	# 17
3D Object Detection	nuScenes Camera Only	BEVFormer	Future Frame	false	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bevformer-learning-bird-s-eye-view/bird-s-eye-view-semantic-segmentation-on-lyft)](https://paperswithcode.com/sota/bird-s-eye-view-semantic-segmentation-on-lyft?p=bevformer-learning-bird-s-eye-view)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bevformer-learning-bird-s-eye-view/robust-camera-only-3d-object-detection-on)](https://paperswithcode.com/sota/robust-camera-only-3d-object-detection-on?p=bevformer-learning-bird-s-eye-view)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bevformer-learning-bird-s-eye-view/bird-s-eye-view-semantic-segmentation-on)](https://paperswithcode.com/sota/bird-s-eye-view-semantic-segmentation-on?p=bevformer-learning-bird-s-eye-view)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bevformer-learning-bird-s-eye-view/3d-object-detection-on-dair-v2x-i)](https://paperswithcode.com/sota/3d-object-detection-on-dair-v2x-i?p=bevformer-learning-bird-s-eye-view)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bevformer-learning-bird-s-eye-view/3d-object-detection-on-nuscenes-camera-only)](https://paperswithcode.com/sota/3d-object-detection-on-nuscenes-camera-only?p=bevformer-learning-bird-s-eye-view)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bevformer-learning-bird-s-eye-view/3d-object-detection-on-nuscenes)](https://paperswithcode.com/sota/3d-object-detection-on-nuscenes?p=bevformer-learning-bird-s-eye-view)`

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

31 Mar 2022 · Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, Jifeng Dai ·

3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, we design spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, we propose temporal self-attention to recurrently fuse the history BEV information. Our approach achieves the new state-of-the-art 56.9\% in terms of NDS metric on the nuScenes \texttt{test} set, which is 9.0 points higher than previous best arts and on par with the performance of LiDAR-based baselines. We further show that BEVFormer remarkably improves the accuracy of velocity estimation and recall of objects under low visibility conditions. The code is available at \url{https://github.com/zhiqi-li/BEVFormer}.

PDF Abstract

Code

Add Remove Mark official

zhiqi-li/BEVFormer official

fundamentalvision/BEVFormer

2,843

valeoai/pointbev

Tasks

Add Remove

3D Object Detection

Autonomous Driving

Bird's-Eye View Semantic Segmentation

Robust Camera Only 3D Object Detection

Datasets

nuScenes

Waymo Open Dataset DAIR-V2X nuScenes-C

Results from the Paper

Edit

Ranked #2 on Bird's-Eye View Semantic Segmentation on Lyft Level 5

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
3D Object Detection	DAIR-V2X-I	BEVFormer	AP\|R40(moderate)	50.7	# 8	Compare
			AP\|R40(easy)	61.4	# 8	Compare
			AP\|R40(hard)	50.7	# 8	Compare
Bird's-Eye View Semantic Segmentation	Lyft Level 5	BEVFormer (EfficientNet-b4)	IoU vehicle - 224x480 - Long	44.5	# 2	Compare
Bird's-Eye View Semantic Segmentation	Lyft Level 5	BEVFormer (EfficientNet-b4)	IoU vehicle - 224x480 - Short	69.9	# 5	Compare
Bird's-Eye View Semantic Segmentation	Lyft Level 5	BEVFormer(ResNet-50)	IoU vehicle - 224x480 - Long	43.2	# 6	Compare
Bird's-Eye View Semantic Segmentation	Lyft Level 5	BEVFormer(ResNet-50)	IoU vehicle - 224x480 - Short	68.8	# 6	Compare
3D Object Detection	nuScenes	BEVFormer	NDS	0.57	# 213	Compare
			mAP	0.48	# 205	Compare
			mATE	0.58	# 86	Compare
			mASE	0.26	# 47	Compare
			mAOE	0.38	# 167	Compare
			mAVE	0.38	# 138	Compare
			mAAE	0.13	# 103	Compare
Bird's-Eye View Semantic Segmentation	nuScenes	BEVFormer	IoU veh - 224x480 - No vis filter - 100x100 at 0.5	35.8	# 6	Compare
			IoU veh - 448x800 - No vis filter - 100x100 at 0.5	39.0	# 4	Compare
			IoU veh - 224x480 - Vis filter. - 100x100 at 0.5	42.0	# 4	Compare
			IoU veh - 448x800 - Vis filter. - 100x100 at 0.5	45.5	# 4	Compare
			IoU lane - 224x480 - 100x100 at 0.5	25.7	# 5	Compare
Robust Camera Only 3D Object Detection	nuScenes-C	BEVFormer (small)	mean Corruption Error (mCE)	102.4	# 14	Compare
Robust Camera Only 3D Object Detection	nuScenes-C	BEVFormer (small)	mean Resilience Rate (mRR)	59.07	# 14	Compare
Robust Camera Only 3D Object Detection	nuScenes-C	BEVFormer (base)	mean Corruption Error (mCE)	97.97	# 3	Compare
Robust Camera Only 3D Object Detection	nuScenes-C	BEVFormer (base)	mean Resilience Rate (mRR)	60.4	# 13	Compare
3D Object Detection	nuScenes Camera Only	BEVFormer	NDS	56.9	# 17	Compare
3D Object Detection	nuScenes Camera Only	BEVFormer	Future Frame	false	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove