Visual Prompt Tuning

19 papers with code • 4 benchmarks • 0 datasets

Visual Prompt Tuning(VPT) only introduces a small amount of task-specific learnable parameters into the input space while freezing the entire pre-trained Transformer backbone during downstream training. In practice, these additional parameters are simply prepended into the input sequence of each Transformer layer and learned together with a linear head during fine-tuning. VPT is especially effective in the low-data regime, and maintains its advantage across data scales. Finally, VPT is competitive for a range of Transformer scales and designs (ViTBase/Large/Huge, Swin). Put together, the results suggest that VPT is one of the most effective ways of adapting ever-growing vision backbones.

Most implemented papers

E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning

chenghan111/e2vpt ICCV 2023

Specifically, we introduce a set of learnable key-value prompts and visual prompts into self-attention and input layers, respectively, to improve the effectiveness of model fine-tuning.

Online Class Incremental Learning on Stochastic Blurry Task Boundary via Mask and Visual Prompt Tuning

moonjunyyy/si-blurry ICCV 2023

In addition, to alleviate the class imbalance problem, we introduce a new gradient similarity-based focal loss and adaptive feature scaling to ease overfitting to the major classes and underfitting to the minor classes.

Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning

ubc-tea/SGPT 27 Oct 2023

Existing Generalized FL (GFL) and Personalized FL (PFL) methods have limitations in balancing performance across both global and local data distributions.

TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene Understanding

tb2-sy/tsp-transformer 6 Nov 2023

Holistic scene understanding includes semantic segmentation, surface normal estimation, object boundary detection, depth estimation, etc.

SA$^2$VP: Spatially Aligned-and-Adapted Visual Prompt

tommy-xq/sa2vp 16 Dec 2023

Typical methods for visual prompt tuning follow the sequential modeling paradigm stemming from NLP, which represents an input image as a flattened sequence of token embeddings and then learns a set of unordered parameterized tokens prefixed to the sequence representation as the visual prompts for task adaptation of large vision models.

Revisiting the Power of Prompt for Visual Tuning

wangyz1608/self-prompt-tuning 4 Feb 2024

Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes.

CoLLaVO: Crayon Large Language and Vision mOdel

ByungKwanLee/CoLLaVO 17 Feb 2024

Our findings reveal that the image understanding capabilities of current VLMs are strongly correlated with their zero-shot performance on vision language (VL) tasks.

CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning

Cuzyoung/CoDA 26 Mar 2024

SAVPT features a novel metric Severity that divides all adverse scene images into low-severity and high-severity images.

ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning

clovaai/ECLIPSE 29 Mar 2024

Panoptic segmentation, combining semantic and instance segmentation, stands as a cutting-edge computer vision task.