DoRA: Weight-Decomposed Low-Rank Adaptation

Among the widely used parameter-efficient fine-tuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs. However, there still often exists an accuracy gap between these methods and full fine-tuning (FT). In this work, we first introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA). DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. By employing \ours, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead. \ours~consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding. Code is available at https://github.com/NVlabs/DoRA.

PDF Abstract

Results from the Paper


Ranked #2 on parameter-efficient fine-tuning on WinoGrande (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
parameter-efficient fine-tuning BoolQ LLaMA2-7b Accuracy (% ) 81.93 # 3
parameter-efficient fine-tuning HellaSwag LLaMA2-7b Accuracy (% ) 76.27 # 3
parameter-efficient fine-tuning WinoGrande LLaMA2-7b Accuracy (% ) 70.09 # 2

Methods


No methods listed for this paper. Add relevant methods here