HEAT: Head-level Parameter Efficient Adaptation of Vision Transformers with Taylor-expansion Importance Scores

13 Apr 2024 · Yibo Zhong, Yao Zhou ·

Prior computer vision research extensively explores adapting pre-trained vision transformers (ViT) to downstream tasks. However, the substantial number of parameters requiring adaptation has led to a focus on Parameter Efficient Transfer Learning (PETL) as an approach to efficiently adapt large pre-trained models by training only a subset of parameters, achieving both parameter and storage efficiency. Although the significantly reduced parameters have shown promising performance under transfer learning scenarios, the structural redundancy inherent in the model still leaves room for improvement, which warrants further investigation. In this paper, we propose Head-level Efficient Adaptation with Taylor-expansion importance score (HEAT): a simple method that efficiently fine-tuning ViTs at head levels. In particular, the first-order Taylor expansion is employed to calculate each head's importance score, termed Taylor-expansion Importance Score (TIS), indicating its contribution to specific tasks. Additionally, three strategies for calculating TIS have been employed to maximize the effectiveness of TIS. These strategies calculate TIS from different perspectives, reflecting varying contributions of parameters. Besides ViT, HEAT has also been applied to hierarchical transformers such as Swin Transformer, demonstrating its versatility across different transformer architectures. Through extensive experiments, HEAT has demonstrated superior performance over state-of-the-art PETL methods on the VTAB-1K benchmark.

PDF Abstract