TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Object Detection	COCO minival	Sparse R-CNN (PVTv2-B2)	box AP	50.1	# 76
Object Detection	COCO minival	Sparse R-CNN (PVTv2-B2)	AP50	69.5	# 23
Object Detection	COCO minival	Sparse R-CNN (PVTv2-B2)	AP75	54.9	# 18
Object Detection	COCO-O	PVTv2-B5 (Mask R-CNN)	Average mAP	28.2	# 23
Object Detection	COCO-O	PVTv2-B5 (Mask R-CNN)	Effective Robustness	6.85	# 17
Image Classification	ImageNet	PVTv2-B3	Top 1 Accuracy	83.2%	# 413
Image Classification	ImageNet	PVTv2-B3	Number of params	45.2M	# 708
Image Classification	ImageNet	PVTv2-B3	GFLOPs	6.9	# 248
Image Classification	ImageNet	PVTv2-B1	Top 1 Accuracy	78.7%	# 746
Image Classification	ImageNet	PVTv2-B1	Number of params	13.1M	# 506
Image Classification	ImageNet	PVTv2-B1	GFLOPs	2.1	# 151
Image Classification	ImageNet	PVTv2-B0	Top 1 Accuracy	70.5%	# 948
Image Classification	ImageNet	PVTv2-B0	Number of params	3.4M	# 372
Image Classification	ImageNet	PVTv2-B0	GFLOPs	0.6	# 65
Image Classification	ImageNet	PVTv2-B2	Top 1 Accuracy	82%	# 530
Image Classification	ImageNet	PVTv2-B2	Number of params	25.4M	# 595
Image Classification	ImageNet	PVTv2-B2	GFLOPs	4	# 191
Image Classification	ImageNet	PVTv2-B4	Top 1 Accuracy	83.8%	# 358
Image Classification	ImageNet	PVTv2-B4	Number of params	82M	# 808
Image Classification	ImageNet	PVTv2-B4	GFLOPs	11.8	# 313

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pvtv2-improved-baselines-with-pyramid-vision/object-detection-on-coco-o)](https://paperswithcode.com/sota/object-detection-on-coco-o?p=pvtv2-improved-baselines-with-pyramid-vision)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pvtv2-improved-baselines-with-pyramid-vision/object-detection-on-coco-minival)](https://paperswithcode.com/sota/object-detection-on-coco-minival?p=pvtv2-improved-baselines-with-pyramid-vision)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pvtv2-improved-baselines-with-pyramid-vision/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=pvtv2-improved-baselines-with-pyramid-vision)`

PVT v2: Improved Baselines with Pyramid Vision Transformer

25 Jun 2021 · Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao ·

Transformer recently has presented encouraging progress in computer vision. In this work, we present new baselines by improving the original Pyramid Vision Transformer (PVT v1) by adding three designs, including (1) linear complexity attention layer, (2) overlapping patch embedding, and (3) convolutional feed-forward network. With these modifications, PVT v2 reduces the computational complexity of PVT v1 to linear and achieves significant improvements on fundamental vision tasks such as classification, detection, and segmentation. Notably, the proposed PVT v2 achieves comparable or better performances than recent works such as Swin Transformer. We hope this work will facilitate state-of-the-art Transformer researches in computer vision. Code is available at https://github.com/whai362/PVT.

PDF Abstract

Code

Add Remove Mark official

whai362/PVT official

1,646

rwightman/pytorch-image-models

29,774

open-mmlab/mmdetection

27,806

PaddlePaddle/PaddleClas

5,257

open-mmlab/mmpose

5,006

See all 16 implementations

Tasks

Add Remove

Image Classification

Object Detection

Panoptic Segmentation

Datasets

ImageNet

MS COCO

ADE20K ImageNet-1K

COCO-O

Results from the Paper

Edit

Ranked #23 on Object Detection on COCO-O

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Object Detection	COCO minival	Sparse R-CNN (PVTv2-B2)	box AP	50.1	# 76	Compare
			AP50	69.5	# 23	Compare
			AP75	54.9	# 18	Compare
Object Detection	COCO-O	PVTv2-B5 (Mask R-CNN)	Average mAP	28.2	# 23	Compare
Object Detection	COCO-O	PVTv2-B5 (Mask R-CNN)	Effective Robustness	6.85	# 17	Compare
Image Classification	ImageNet	PVTv2-B3	Top 1 Accuracy	83.2%	# 413	Compare
			Number of params	45.2M	# 708	Compare
			GFLOPs	6.9	# 248	Compare
Image Classification	ImageNet	PVTv2-B1	Top 1 Accuracy	78.7%	# 746	Compare
			Number of params	13.1M	# 506	Compare
			GFLOPs	2.1	# 151	Compare
Image Classification	ImageNet	PVTv2-B0	Top 1 Accuracy	70.5%	# 948	Compare
			Number of params	3.4M	# 372	Compare
			GFLOPs	0.6	# 65	Compare
Image Classification	ImageNet	PVTv2-B2	Top 1 Accuracy	82%	# 530	Compare
			Number of params	25.4M	# 595	Compare
			GFLOPs	4	# 191	Compare
Image Classification	ImageNet	PVTv2-B4	Top 1 Accuracy	83.8%	# 358	Compare
			Number of params	82M	# 808	Compare
			GFLOPs	11.8	# 313	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Depthwise Convolution • Dropout • GELU • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • PVTv2 • Residual Connection • Scaled Dot-Product Attention • Softmax • Stochastic Depth • Swin Transformer • Transformer • Vision Transformer

Edit Social Preview

PVT v2: Improved Baselines with Pyramid Vision Transformer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove