Visual Prompt Tuning

19 papers with code • 4 benchmarks • 0 datasets

Visual Prompt Tuning(VPT) only introduces a small amount of task-specific learnable parameters into the input space while freezing the entire pre-trained Transformer backbone during downstream training. In practice, these additional parameters are simply prepended into the input sequence of each Transformer layer and learned together with a linear head during fine-tuning. VPT is especially effective in the low-data regime, and maintains its advantage across data scales. Finally, VPT is competitive for a range of Transformer scales and designs (ViTBase/Large/Huge, Swin). Put together, the results suggest that VPT is one of the most effective ways of adapting ever-growing vision backbones.

Most implemented papers

Visual Prompt Tuning

KMnP/vpt 23 Mar 2022

The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, ie, full fine-tuning.

Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models

zyh16143998882/iccv23-idpt ICCV 2023

To conquer this limitation, we propose a novel Instance-aware Dynamic Prompt Tuning (IDPT) strategy for pre-trained point cloud models.

Understanding Zero-Shot Adversarial Robustness for Large-Scale Models

cvlab-columbia/zsrobust4foundationmodel 14 Dec 2022

We apply this training loss to two adaption methods, model finetuning and visual prompt tuning.

Dual Modality Prompt Tuning for Vision-Language Pre-Trained Model

fanrena/dpt 17 Aug 2022

To make the final image feature concentrate more on the target visual concept, a Class-Aware Visual Prompt Tuning (CAVPT) scheme is further proposed in our DPT, where the class-aware visual prompt is generated dynamically by performing the cross attention between text prompts features and image patch token embeddings to encode both the downstream task-related information and visual instance information.

Visual Prompt Tuning for Generative Transfer Learning

google-research/generative_transfer CVPR 2023

We base our framework on state-of-the-art generative vision transformers that represent an image as a sequence of visual tokens to the autoregressive or non-autoregressive transformers.

Unified Vision and Language Prompt Learning

yuhangzang/upt 13 Oct 2022

Prompt tuning, a parameter- and data-efficient transfer learning paradigm that tunes only a small number of parameters in a model's input space, has become a trend in the vision community since the emergence of large vision-language models like CLIP.

Multitask Vision-Language Prompt Tuning

sincerass/mvlpt 21 Nov 2022

Specifically, (i) we demonstrate the effectiveness of learning a single transferable prompt from multiple source tasks to initialize the prompt for each target task; (ii) we show many target tasks can benefit each other from sharing prompt vectors and thus can be jointly learned via multitask prompt tuning.

Improving Visual Prompt Tuning for Self-supervised Vision Transformers

ryongithub/gatedprompttuning 8 Jun 2023

Visual Prompt Tuning (VPT) is an effective tuning method for adapting pretrained Vision Transformers (ViTs) to downstream tasks.

TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine Perception

NYCU-MAPL/TransTIC ICCV 2023

This work aims for transferring a Transformer-based image compression codec from human perception to machine perception without fine-tuning the codec.

Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting

oppo-us-research/USST ICCV 2023

In this paper, we set up an egocentric 3D hand trajectory forecasting task that aims to predict hand trajectories in a 3D space from early observed RGB videos in a first-person view.