Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts

18 Sep 2023  ยท  Jiang-Xin Shi, Tong Wei, Zhi Zhou, Jie-Jing Shao, Xin-Yan Han, Yu-Feng Li ยท

The fine-tuning paradigm in addressing long-tail learning tasks has sparked significant interest since the emergence of foundation models. Nonetheless, how fine-tuning impacts performance in long-tail learning was not explicitly quantified. In this paper, we disclose that heavy fine-tuning may even lead to non-negligible performance deterioration on tail classes, and lightweight fine-tuning is more effective. The reason is attributed to inconsistent class conditions caused by heavy fine-tuning. With the observation above, we develop a low-complexity and accurate long-tail learning algorithms LIFT with the goal of facilitating fast prediction and compact models by adaptive lightweight fine-tuning. Experiments clearly verify that both the training time and the learned parameters are significantly reduced with more accurate predictive performance compared with state-of-the-art approaches.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Long-tail Learning CIFAR-100-LT (ฯ=10) LIFT (ViT-B/16, CLIP) Error Rate 15.1 # 4
Long-tail Learning CIFAR-100-LT (ฯ=10) LIFT (ViT-B/16, ImageNet-21K pre-training) Error Rate 8.7 # 1
Long-tail Learning CIFAR-100-LT (ฯ=100) LIFT (ViT-B/16, CLIP) Error Rate 18.3 # 3
Long-tail Learning CIFAR-100-LT (ฯ=100) LIFT (ViT-B/16, ImageNet-21K pre-training) Error Rate 10.9 # 1
Long-tail Learning CIFAR-100-LT (ฯ=50) LIFT (ViT-B/16, ImageNet-21K pre-training) Error Rate 9.8 # 1
Long-tail Learning CIFAR-100-LT (ฯ=50) LIFT (ViT-B/16, CLIP) Error Rate 16.9 # 4
Long-tail Learning ImageNet-LT LIFT (ViT-B/16) Top-1 Accuracy 78.3 # 4
Long-tail Learning ImageNet-LT LIFT (ViT-L/14) Top-1 Accuracy 82.9 # 1
Long-tail Learning iNaturalist 2018 LIFT (ViT-B/16) Top-1 Accuracy 80.4% # 5
Long-tail Learning iNaturalist 2018 LIFT (ViT-L/14) Top-1 Accuracy 85.2% # 2
Long-tail Learning iNaturalist 2018 LIFT (ViT-L/14@336px) Top-1 Accuracy 87.4% # 1
Long-tail Learning Places-LT LIFT (ViT-B/16) Top-1 Accuracy 52.2 # 2
Long-tail Learning Places-LT LIFT (ViT-L/14) Top-1 Accuracy 53.7 # 1

Methods