TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	ADE20K	BiFormer-B (IN1k pretrain, Upernet 160k)	Validation mIoU	51.7	# 86
Semantic Segmentation	ADE20K	Upernet-BiFormer-S (IN1k pretrain, Upernet 160k)	Validation mIoU	50.8	# 102
Object Detection	COCO 2017	BiFormer-S (IN1k pretrain, MaskRCNN 12ep)	mAP	47.8	# 10
Object Detection	COCO 2017	BiFormer-B (IN1k pretrain, MaskRCNN 12ep)	mAP	48.6	# 9
Image Classification	ImageNet	BiFormer-B* (IN1k ptretrain)	Top 1 Accuracy	85.4%	# 221
Image Classification	ImageNet	BiFormer-S* (IN1k ptretrain)	Top 1 Accuracy	84.3%	# 305
Image Classification	ImageNet	BiFormer-T (IN1k ptretrain)	Top 1 Accuracy	81.4%	# 586

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/biformer-vision-transformer-with-bi-level/object-detection-on-coco-2017)](https://paperswithcode.com/sota/object-detection-on-coco-2017?p=biformer-vision-transformer-with-bi-level)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/biformer-vision-transformer-with-bi-level/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=biformer-vision-transformer-with-bi-level)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/biformer-vision-transformer-with-bi-level/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=biformer-vision-transformer-with-bi-level)`

BiFormer: Vision Transformer with Bi-Level Routing Attention

CVPR 2023 · Lei Zhu, Xinjiang Wang, Zhanghan Ke, Wayne Zhang, Rynson Lau ·

As the core building block of vision transformers, attention is a powerful tool to capture long-range dependency. However, such power comes at a cost: it incurs a huge computation burden and heavy memory footprint as pairwise token interaction across all spatial locations is computed. A series of works attempt to alleviate this problem by introducing handcrafted and content-agnostic sparsity into attention, such as restricting the attention operation to be inside local windows, axial stripes, or dilated windows. In contrast to these approaches, we propose a novel dynamic sparse attention via bi-level routing to enable a more flexible allocation of computations with content awareness. Specifically, for a query, irrelevant key-value pairs are first filtered out at a coarse region level, and then fine-grained token-to-token attention is applied in the union of remaining candidate regions (\ie, routed regions). We provide a simple yet effective implementation of the proposed bi-level routing attention, which utilizes the sparsity to save both computation and memory while involving only GPU-friendly dense matrix multiplications. Built with the proposed bi-level routing attention, a new general vision transformer, named BiFormer, is then presented. As BiFormer attends to a small subset of relevant tokens in a \textbf{query adaptive} manner without distraction from other irrelevant ones, it enjoys both good performance and high computational efficiency, especially in dense prediction tasks. Empirical results across several computer vision tasks such as image classification, object detection, and semantic segmentation verify the effectiveness of our design. Code is available at \url{https://github.com/rayleizhu/BiFormer}.

PDF Abstract CVPR 2023 PDF CVPR 2023 Abstract

Code

Add Remove Mark official

rayleizhu/biformer official

418

Tasks

Add Remove

Computational Efficiency

Image Classification

object-detection

Object Detection

Semantic Segmentation

Datasets

ImageNet

MS COCO

ADE20K

Results from the Paper

Edit

Ranked #9 on Object Detection on COCO 2017 (mAP metric)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	ADE20K	BiFormer-B (IN1k pretrain, Upernet 160k)	Validation mIoU	51.7	# 86	Compare
Semantic Segmentation	ADE20K	Upernet-BiFormer-S (IN1k pretrain, Upernet 160k)	Validation mIoU	50.8	# 102	Compare
Object Detection	COCO 2017	BiFormer-S (IN1k pretrain, MaskRCNN 12ep)	mAP	47.8	# 10	Compare
Object Detection	COCO 2017	BiFormer-B (IN1k pretrain, MaskRCNN 12ep)	mAP	48.6	# 9	Compare
Image Classification	ImageNet	BiFormer-B* (IN1k ptretrain)	Top 1 Accuracy	85.4%	# 221	Compare
Image Classification	ImageNet	BiFormer-S* (IN1k ptretrain)	Top 1 Accuracy	84.3%	# 305	Compare
Image Classification	ImageNet	BiFormer-T (IN1k ptretrain)	Top 1 Accuracy	81.4%	# 586	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

BiFormer: Vision Transformer with Bi-Level Routing Attention

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove