TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Object Detection	COCO minival	Sparse R-CNN (PVTv2-B2)	box AP	50.1	# 76
Object Detection	COCO minival	Sparse R-CNN (PVTv2-B2)	AP50	69.5	# 23
Object Detection	COCO minival	Sparse R-CNN (PVTv2-B2)	AP75	54.9	# 18
Object Detection	COCO-O	PVTv2-B5 (Mask R-CNN)	Average mAP	28.2	# 23
Object Detection	COCO-O	PVTv2-B5 (Mask R-CNN)	Effective Robustness	6.85	# 17
Image Classification	ImageNet	PVTv2-B3	Top 1 Accuracy	83.2%	# 416
Image Classification	ImageNet	PVTv2-B3	Number of params	45.2M	# 707
Image Classification	ImageNet	PVTv2-B3	GFLOPs	6.9	# 249
Image Classification	ImageNet	PVTv2-B1	Top 1 Accuracy	78.7%	# 750
Image Classification	ImageNet	PVTv2-B1	Number of params	13.1M	# 504
Image Classification	ImageNet	PVTv2-B1	GFLOPs	2.1	# 151
Image Classification	ImageNet	PVTv2-B0	Top 1 Accuracy	70.5%	# 952
Image Classification	ImageNet	PVTv2-B0	Number of params	3.4M	# 373
Image Classification	ImageNet	PVTv2-B0	GFLOPs	0.6	# 65
Image Classification	ImageNet	PVTv2-B2	Top 1 Accuracy	82%	# 534
Image Classification	ImageNet	PVTv2-B2	Number of params	25.4M	# 594
Image Classification	ImageNet	PVTv2-B2	GFLOPs	4	# 191
Image Classification	ImageNet	PVTv2-B4	Top 1 Accuracy	83.8%	# 360
Image Classification	ImageNet	PVTv2-B4	Number of params	82M	# 807
Image Classification	ImageNet	PVTv2-B4	GFLOPs	11.8	# 315

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pvtv2-improved-baselines-with-pyramid-vision/object-detection-on-coco-o)](https://paperswithcode.com/sota/object-detection-on-coco-o?p=pvtv2-improved-baselines-with-pyramid-vision)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pvtv2-improved-baselines-with-pyramid-vision/object-detection-on-coco-minival)](https://paperswithcode.com/sota/object-detection-on-coco-minival?p=pvtv2-improved-baselines-with-pyramid-vision)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pvtv2-improved-baselines-with-pyramid-vision/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=pvtv2-improved-baselines-with-pyramid-vision)`

PVT v2: Improved Baselines with Pyramid Vision Transformer

25 Jun 2021 · Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao ·

Transformer recently has presented encouraging progress in computer vision. In this work, we present new baselines by improving the original Pyramid Vision Transformer (PVT v1) by adding three designs, including (1) linear complexity attention layer, (2) overlapping patch embedding, and (3) convolutional feed-forward network. With these modifications, PVT v2 reduces the computational complexity of PVT v1 to linear and achieves significant improvements on fundamental vision tasks such as classification, detection, and segmentation. Notably, the proposed PVT v2 achieves comparable or better performances than recent works such as Swin Transformer. We hope this work will facilitate state-of-the-art Transformer researches in computer vision. Code is available at https://github.com/whai362/PVT.

PDF Abstract

Code

Add Remove Mark official

whai362/PVT official

1,671

rwightman/pytorch-image-models

30,390

open-mmlab/mmdetection

28,321

PaddlePaddle/PaddleClas

5,304

open-mmlab/mmpose

5,181

See all 17 implementations

Tasks

Add Remove

Image Classification

Object Detection

Panoptic Segmentation

Datasets

ImageNet

MS COCO

ADE20K ImageNet-1K

COCO-O

Results from the Paper

Edit

Ranked #23 on Object Detection on COCO-O

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Object Detection	COCO minival	Sparse R-CNN (PVTv2-B2)	box AP	50.1	# 76	Compare
			AP50	69.5	# 23	Compare
			AP75	54.9	# 18	Compare
Object Detection	COCO-O	PVTv2-B5 (Mask R-CNN)	Average mAP	28.2	# 23	Compare
Object Detection	COCO-O	PVTv2-B5 (Mask R-CNN)	Effective Robustness	6.85	# 17	Compare
Image Classification	ImageNet	PVTv2-B3	Top 1 Accuracy	83.2%	# 416	Compare
			Number of params	45.2M	# 707	Compare
			GFLOPs	6.9	# 249	Compare
Image Classification	ImageNet	PVTv2-B1	Top 1 Accuracy	78.7%	# 750	Compare
			Number of params	13.1M	# 504	Compare
			GFLOPs	2.1	# 151	Compare
Image Classification	ImageNet	PVTv2-B0	Top 1 Accuracy	70.5%	# 952	Compare
			Number of params	3.4M	# 373	Compare
			GFLOPs	0.6	# 65	Compare
Image Classification	ImageNet	PVTv2-B2	Top 1 Accuracy	82%	# 534	Compare
			Number of params	25.4M	# 594	Compare
			GFLOPs	4	# 191	Compare
Image Classification	ImageNet	PVTv2-B4	Top 1 Accuracy	83.8%	# 360	Compare
			Number of params	82M	# 807	Compare
			GFLOPs	11.8	# 315	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Depthwise Convolution • Dropout • GELU • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • PVTv2 • Residual Connection • Scaled Dot-Product Attention • Softmax • Stochastic Depth • Swin Transformer • Transformer • Vision Transformer

Edit Social Preview

PVT v2: Improved Baselines with Pyramid Vision Transformer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove