1 code implementation • 17 Mar 2024 • Shu Zhao, Xiaohan Zou, Tan Yu, Huijuan Xu
Meanwhile, our RebQ leverages extensive multi-modal knowledge from pre-trained LMMs to reconstruct the data of missing modality.
no code implementations • 3 Oct 2022 • Xiaohan Zou, Tong Lin
However, they still suffer from the catastrophic forgetting problem in the setting of continual learning, since the past data of previous tasks are no longer available.
no code implementations • 28 Sep 2022 • Xiaohan Zou, Changqiao Wu, Lele Cheng, Zhongyuan Wang
Most existing methods in vision-language retrieval match two modalities by either comparing their global feature vectors which misses sufficient information and lacks interpretability, detecting objects in images or videos and aligning the text with fine-grained features which relies on complicated model designs, or modeling fine-grained interaction via cross-attention upon visual and textual tokens which suffers from inferior efficiency.