Visual Prompt Tuning

19 papers with code • 4 benchmarks • 0 datasets

Visual Prompt Tuning(VPT) only introduces a small amount of task-specific learnable parameters into the input space while freezing the entire pre-trained Transformer backbone during downstream training. In practice, these additional parameters are simply prepended into the input sequence of each Transformer layer and learned together with a linear head during fine-tuning. VPT is especially effective in the low-data regime, and maintains its advantage across data scales. Finally, VPT is competitive for a range of Transformer scales and designs (ViTBase/Large/Huge, Swin). Put together, the results suggest that VPT is one of the most effective ways of adapting ever-growing vision backbones.

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Prompt Tuning

Dataset	Best Model	Compare
FGVC	SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)	See all
VTAB-1k(Natural<7>)	SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)	See all
VTAB-1k(Specialized<4>)	SPT-Deep(ViT-B/16_MoCo_v3_pretrained_ImageNet-1K)	See all
VTAB-1k(Structured<8>)	SPT-Deep(ViT-B/16_MAE_pretrained_ImageNet-1K)	See all

Most implemented papers

Most implemented Social Latest No code

Visual Prompt Tuning

KMnP/vpt • • 23 Mar 2022

The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, ie, full fine-tuning.

Paper
Code

Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models

zyh16143998882/iccv23-idpt • • ICCV 2023

To conquer this limitation, we propose a novel Instance-aware Dynamic Prompt Tuning (IDPT) strategy for pre-trained point cloud models.

Paper
Code

Understanding Zero-Shot Adversarial Robustness for Large-Scale Models

cvlab-columbia/zsrobust4foundationmodel • • 14 Dec 2022

We apply this training loss to two adaption methods, model finetuning and visual prompt tuning.

Paper
Code

Dual Modality Prompt Tuning for Vision-Language Pre-Trained Model

fanrena/dpt • • 17 Aug 2022

To make the final image feature concentrate more on the target visual concept, a Class-Aware Visual Prompt Tuning (CAVPT) scheme is further proposed in our DPT, where the class-aware visual prompt is generated dynamically by performing the cross attention between text prompts features and image patch token embeddings to encode both the downstream task-related information and visual instance information.

Paper
Code

Visual Prompt Tuning for Generative Transfer Learning

google-research/generative_transfer • • CVPR 2023

We base our framework on state-of-the-art generative vision transformers that represent an image as a sequence of visual tokens to the autoregressive or non-autoregressive transformers.

Paper
Code

Unified Vision and Language Prompt Learning

yuhangzang/upt • 13 Oct 2022

Prompt tuning, a parameter- and data-efficient transfer learning paradigm that tunes only a small number of parameters in a model's input space, has become a trend in the vision community since the emergence of large vision-language models like CLIP.

Paper
Code

Multitask Vision-Language Prompt Tuning

sincerass/mvlpt • • 21 Nov 2022

Specifically, (i) we demonstrate the effectiveness of learning a single transferable prompt from multiple source tasks to initialize the prompt for each target task; (ii) we show many target tasks can benefit each other from sharing prompt vectors and thus can be jointly learned via multitask prompt tuning.

Paper
Code

Improving Visual Prompt Tuning for Self-supervised Vision Transformers

ryongithub/gatedprompttuning • • 8 Jun 2023

Visual Prompt Tuning (VPT) is an effective tuning method for adapting pretrained Vision Transformers (ViTs) to downstream tasks.

Paper
Code

TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine Perception

NYCU-MAPL/TransTIC • • ICCV 2023

This work aims for transferring a Transformer-based image compression codec from human perception to machine perception without fine-tuning the codec.

Paper
Code

Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting

oppo-us-research/USST • • ICCV 2023

In this paper, we set up an egocentric 3D hand trajectory forecasting task that aims to predict hand trajectories in a 3D space from early observed RGB videos in a first-person view.

Paper
Code

Visual Prompt Tuning

Benchmarks Add a Result

Most implemented papers

Content

Benchmarks

Add a Result