TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Long-tail Learning	CIFAR-100-LT (ρ=10)	VPT	Error Rate	10.4	# 3
Long-tail Learning	CIFAR-100-LT (ρ=100)	VPT	Error Rate	19	# 4
Long-tail Learning	CIFAR-100-LT (ρ=50)	VPT	Error Rate	15.2	# 3
Visual Prompt Tuning	FGVC	VPT-Shallow (ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)	Mean Accuracy	79.26	# 6
Visual Prompt Tuning	FGVC	VPT-Deep (ViT-B/16_MAE_pretrained_ImageNet-1K)	Mean Accuracy	72.02	# 9
Visual Prompt Tuning	FGVC	VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)	Mean Accuracy	83.12	# 4
Visual Prompt Tuning	FGVC	VPT-Shallow (ViT-B/16_MAE_pretrained_ImageNet-1K)	Mean Accuracy	57.84	# 10
Prompt Engineering	ImageNet-21k	VPT	Accuracy	24.8	# 2
Visual Prompt Tuning	VTAB-1k(Natural<7>)	VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)	Mean Accuracy	67.34	# 5
Visual Prompt Tuning	VTAB-1k(Natural<7>)	VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)	Mean Accuracy	70.27	# 4
Visual Prompt Tuning	VTAB-1k(Natural<7>)	VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)	Mean Accuracy	36.02	# 10
Visual Prompt Tuning	VTAB-1k(Natural<7>)	VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)	Mean Accuracy	39.96	# 9
Visual Prompt Tuning	VTAB-1k(Specialized<4>)	VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)	Mean Accuracy	83.04	# 5
Visual Prompt Tuning	VTAB-1k(Specialized<4>)	VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)	Mean Accuracy	69.65	# 9
Visual Prompt Tuning	VTAB-1k(Specialized<4>)	VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)	Mean Accuracy	60.61	# 10
Visual Prompt Tuning	VTAB-1k(Specialized<4>)	VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)	Mean Accuracy	82.26	# 6
Visual Prompt Tuning	VTAB-1k(Structured<8>)	VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)	Mean Accuracy	37.55	# 7
Visual Prompt Tuning	VTAB-1k(Structured<8>)	VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)	Mean Accuracy	27.50	# 9
Visual Prompt Tuning	VTAB-1k(Structured<8>)	VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)	Mean Accuracy	26.57	# 10
Visual Prompt Tuning	VTAB-1k(Structured<8>)	VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)	Mean Accuracy	42.38	# 6

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/visual-prompt-tuning/prompt-engineering-on-imagenet-21k)](https://paperswithcode.com/sota/prompt-engineering-on-imagenet-21k?p=visual-prompt-tuning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/visual-prompt-tuning/long-tail-learning-on-cifar-100-lt-r-10)](https://paperswithcode.com/sota/long-tail-learning-on-cifar-100-lt-r-10?p=visual-prompt-tuning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/visual-prompt-tuning/long-tail-learning-on-cifar-100-lt-r-50)](https://paperswithcode.com/sota/long-tail-learning-on-cifar-100-lt-r-50?p=visual-prompt-tuning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/visual-prompt-tuning/long-tail-learning-on-cifar-100-lt-r-100)](https://paperswithcode.com/sota/long-tail-learning-on-cifar-100-lt-r-100?p=visual-prompt-tuning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/visual-prompt-tuning/visual-prompt-tuning-on-fgvc)](https://paperswithcode.com/sota/visual-prompt-tuning-on-fgvc?p=visual-prompt-tuning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/visual-prompt-tuning/visual-prompt-tuning-on-vtab-1k-natural-7)](https://paperswithcode.com/sota/visual-prompt-tuning-on-vtab-1k-natural-7?p=visual-prompt-tuning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/visual-prompt-tuning/visual-prompt-tuning-on-vtab-1k-specialized-4)](https://paperswithcode.com/sota/visual-prompt-tuning-on-vtab-1k-specialized-4?p=visual-prompt-tuning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/visual-prompt-tuning/visual-prompt-tuning-on-vtab-1k-structured-8)](https://paperswithcode.com/sota/visual-prompt-tuning-on-vtab-1k-structured-8?p=visual-prompt-tuning)`

Visual Prompt Tuning

23 Mar 2022 · Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, Ser-Nam Lim ·

The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, ie, full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. Taking inspiration from recent advances in efficiently tuning large language models, VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen. Via extensive experiments on a wide variety of downstream recognition tasks, we show that VPT achieves significant performance gains compared to other parameter efficient tuning protocols. Most importantly, VPT even outperforms full fine-tuning in many cases across model capacities and training data scales, while reducing per-task storage cost.

PDF Abstract

Code

Add Remove Mark official

KMnP/vpt official

911

Yiming-M/CLIP-EBC

heekhero/DTL

TooTouch/VPT

wgcban/apt

See all 6 implementations

Tasks

Add Remove

Image Classification

Long-tail Learning

Prompt Engineering

Visual Prompt Tuning

Datasets

ImageNet

CIFAR-100

SVHN

Oxford 102 Flower

DTD

EuroSAT

RESISC45

NABirds

Results from the Paper

Edit

Ranked #2 on Prompt Engineering on ImageNet-21k

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Long-tail Learning	CIFAR-100-LT (ρ=10)	VPT	Error Rate	10.4	# 3	Compare
Long-tail Learning	CIFAR-100-LT (ρ=100)	VPT	Error Rate	19	# 4	Compare
Long-tail Learning	CIFAR-100-LT (ρ=50)	VPT	Error Rate	15.2	# 3	Compare
Visual Prompt Tuning	FGVC	VPT-Shallow (ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)	Mean Accuracy	79.26	# 6	Compare
Visual Prompt Tuning	FGVC	VPT-Deep (ViT-B/16_MAE_pretrained_ImageNet-1K)	Mean Accuracy	72.02	# 9	Compare
Visual Prompt Tuning	FGVC	VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)	Mean Accuracy	83.12	# 4	Compare
Visual Prompt Tuning	FGVC	VPT-Shallow (ViT-B/16_MAE_pretrained_ImageNet-1K)	Mean Accuracy	57.84	# 10	Compare
Prompt Engineering	ImageNet-21k	VPT	Accuracy	24.8	# 2	Compare
Visual Prompt Tuning	VTAB-1k(Natural<7>)	VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)	Mean Accuracy	67.34	# 5	Compare
Visual Prompt Tuning	VTAB-1k(Natural<7>)	VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)	Mean Accuracy	70.27	# 4	Compare
Visual Prompt Tuning	VTAB-1k(Natural<7>)	VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)	Mean Accuracy	36.02	# 10	Compare
Visual Prompt Tuning	VTAB-1k(Natural<7>)	VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)	Mean Accuracy	39.96	# 9	Compare
Visual Prompt Tuning	VTAB-1k(Specialized<4>)	VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)	Mean Accuracy	83.04	# 5	Compare
Visual Prompt Tuning	VTAB-1k(Specialized<4>)	VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)	Mean Accuracy	69.65	# 9	Compare
Visual Prompt Tuning	VTAB-1k(Specialized<4>)	VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)	Mean Accuracy	60.61	# 10	Compare
Visual Prompt Tuning	VTAB-1k(Specialized<4>)	VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)	Mean Accuracy	82.26	# 6	Compare
Visual Prompt Tuning	VTAB-1k(Structured<8>)	VPT-Shallow(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)	Mean Accuracy	37.55	# 7	Compare
Visual Prompt Tuning	VTAB-1k(Structured<8>)	VPT-Shallow(ViT-B/16_MAE_pretrained_ImageNet-1K)	Mean Accuracy	27.50	# 9	Compare
Visual Prompt Tuning	VTAB-1k(Structured<8>)	VPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)	Mean Accuracy	26.57	# 10	Compare
Visual Prompt Tuning	VTAB-1k(Structured<8>)	VPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)	Mean Accuracy	42.38	# 6	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Visual Prompt Tuning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove