1 code implementation • 16 Feb 2024 • Shengzhi Li, Rongyu Lin, Shichao Pei
In conclusion, we propose a distillation-based multi-modal alignment model with fine-grained annotations on a small dataset that reconciles the textual and visual performance of MLLMs, restoring and boosting language capability after visual instruction tuning.
1 code implementation • 7 Aug 2023 • Shengzhi Li, Nima Tajbakhsh
We asked GPT-4 to assess the matching quality of our question-answer turns given the paper's context, obtaining an average rating of 8. 7/10 on our 3K test set.
Ranked #1 on Visual Question Answering (VQA) on SciGraphQA