no code implementations • 19 Apr 2024 • Juncheng Yang, Zuchao Li, Shuai Xie, WeiPing Zhu, Wei Yu, Shijun Li
While some methods overcome the need for training by leveraging image modality cache and retrieval, they overlook the text modality's importance and cross-modal cues for the efficient adaptation of parameters in visual-language models.