no code implementations • 27 Apr 2025 • Yiyang Zhou, Zhaoyang Wang, Tianle Wang, Shangyu Xing, Peng Xia, Bo Li, Kaiyuan Zheng, Zijian Zhang, Zhaorun Chen, Wenhao Zheng, Xuchao Zhang, Chetan Bansal, Weitong Zhang, Ying WEI, Mohit Bansal, Huaxiu Yao
High-quality preference data is essential for aligning foundation models with human values through preference learning.
no code implementations • 30 Dec 2024 • Shangyu Xing, Changhao Xiang, Yuteng Han, Yifan Yue, Zhen Wu, Xinyu Liu, Zhangtai Wu, Fei Zhao, Xinyu Dai
To address this limitation, we introduce GePBench, a novel benchmark designed to assess the geometric perception capabilities of MLLMs.
no code implementations • 23 May 2024 • Fei Zhao, Taotian Pang, Chunhui Li, Zhen Wu, Junjie Guo, Shangyu Xing, Xinyu Dai
Multimodal Large Language Models (MLLMs) are widely regarded as crucial in the exploration of Artificial General Intelligence (AGI).
Ranked #155 on
Visual Question Answering
on MM-Vet
1 code implementation • 15 Feb 2024 • Shangyu Xing, Fei Zhao, Zhen Wu, Tuo An, WeiHao Chen, Chunhui Li, Jianbing Zhang, Xinyu Dai
Multimodal large language models (MLLMs) have attracted increasing attention in the past few years, but they may still generate descriptions that include objects not present in the corresponding images, a phenomenon known as object hallucination.
1 code implementation • 9 Oct 2023 • Shangyu Xing, Fei Zhao, Zhen Wu, Chunhui Li, Jianbing Zhang, Xinyu Dai
Multimodal Entity Linking (MEL) is a task that aims to link ambiguous mentions within multimodal contexts to referential entities in a multimodal knowledge base.