no code implementations • 12 Mar 2025 • Xiaoda Yang, Junyu Lu, Hongshun Qiu, Sijing Li, Hao Li, Shengpeng Ji, Xudong Tang, Jiayang Xu, Jiaqi Duan, Ziyue Jiang, Cong Lin, Sihang Cai, Zejian Xie, Zhuoyang Song, Songxin Zhang
Vision-Language Models (VLMs) based on Mixture-of-Experts (MoE) architectures have emerged as a pivotal paradigm in multimodal understanding, offering a powerful framework for integrating visual and linguistic information.
no code implementations • 27 Jan 2024 • Yuxin Liang, Zhuoyang Song, Hao Wang, Jiaxing Zhang
We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state, a key factor in countering factual hallucination and ensuring reliable application of LLMs.
no code implementations • 8 Dec 2023 • Junyu Lu, Dixiang Zhang, Songxin Zhang, Zejian Xie, Zhuoyang Song, Cong Lin, Jiaxing Zhang, BingYi Jing, Pingjian Zhang
During the instruction fine-tuning stage, we introduce semantic-aware visual feature extraction, a crucial method that enables the model to extract informative features from concrete visual objects.
Ranked #1 on
Image Captioning
on nocaps entire
2 code implementations • 15 Nov 2023 • Junqing He, Kunhao Pan, Xiaoqun Dong, Zhuoyang Song, Yibo Liu, Qianguo Sun, Yuxin Liang, Hao Wang, Enming Zhang, Jiaxing Zhang
While large language models (LLMs) are equipped with longer text input capabilities than before, they are struggling to seek correct information in long contexts.