1 code implementation • 1 Jun 2025 • Sau Lai Yip, Sunan He, Yuxiang Nie, Shu Pui Chan, Yilin Ye, Sum Ying Lam, Hao Chen
Our findings highlight critical capability gaps in current GMAI systems while establishing textbook-derived multimodal benchmarks as essential evaluation tools.
1 code implementation • 30 Apr 2025 • Linshan Wu, Yuxiang Nie, Sunan He, Jiaxin Zhuang, Hao Chen
UniBiomed is based on a novel integration of Multi-modal Large Language Model (MLLM) and Segment Anything Model (SAM), which effectively unifies the generation of clinical texts and the segmentation of corresponding biomedical objects for grounded interpretation.
1 code implementation • 26 Mar 2025 • Han Wang, YongJie Ye, Bingru Li, Yuxiang Nie, Jinghui Lu, Jingqun Tang, Yanjie Wang, Can Huang
We introduce Vision as LoRA (VoRA), a novel paradigm for transforming an LLM into an MLLM.
1 code implementation • 26 Jan 2025 • Yuxiang Nie, Sunan He, Yequan Bie, Yihui Wang, Zhixuan Chen, Shu Yang, Hao Chen
This dual alignment strategy enhances the model's capability to associate specific image regions with relevant concepts, thereby improving both the precision of analysis and the interpretability of the AI system.
1 code implementation • 12 Dec 2024 • Han Wang, Yuxiang Nie, YongJie Ye, Deng GuanYu, Yanjie Wang, Shuai Li, Haiyang Yu, Jinghui Lu, Can Huang
The application of Large Vision-Language Models (LVLMs) for analyzing images and videos is an exciting and rapidly evolving field.
1 code implementation • 23 Apr 2024 • Sunan He, Yuxiang Nie, Hongmei Wang, Shu Yang, Yihui Wang, Zhiyuan Cai, Zhixuan Chen, Yingxue Xu, Luyang Luo, Huiling Xiang, Xi Lin, Mingxiang Wu, Yifan Peng, George Shih, Ziyang Xu, Xian Wu, Qiong Wang, Ronald Cheong Kin Chan, Varut Vardhanabhuti, Winnie Chiu Wing Chu, Yefeng Zheng, Pranav Rajpurkar, Kang Zhang, Hao Chen
Specifically, we propose a cooperative framework, Generalist-Specialist Collaboration (GSCo), which consists of two stages, namely the construction of GFM and specialists, and collaborative inference on downstream tasks.
1 code implementation • 4 Apr 2024 • Yuting He, Fuxiang Huang, Xinrui Jiang, Yuxiang Nie, Minghao Wang, Jiguang Wang, Hao Chen
To answer these questions, a comprehensive and deep survey of the challenges, opportunities, and future directions of HFMs is presented in this survey.
no code implementations • 26 Mar 2024 • Yuxiang Nie, Heyan Huang, Xian-Ling Mao, Lizi Liao
Specifically, IDPT decouples initiative factors into different prefix parameters and uses the attention mechanism to adjust the selection of initiatives in guiding generation dynamically.
1 code implementation • 25 Mar 2024 • Han Wang, Yanjie Wang, YongJie Ye, Yuxiang Nie, Can Huang
Multi-modal Large Language Models (MLLMs) have demonstrated their ability to perceive objects in still images, but their application in video-related tasks, such as object tracking, remains understudied.
Ranked #2 on
Zero-Shot Single Object Tracking
on LaSOT
no code implementations • 25 Jun 2023 • Xiao Zhang, Heqi Zheng, Yuxiang Nie, Heyan Huang, Xian-Ling Mao
However, the dataset has ignored the fact that different readers may have different levels of understanding of the text, and only includes single-perspective question-answer pairs, leading to a lack of consideration of different perspectives.
1 code implementation • 3 May 2023 • Yuxiang Nie, Heyan Huang, Wei Wei, Xian-Ling Mao
To alleviate the problem, it might be possible to generate long-document QA pairs via unsupervised question answering (UQA) methods.
1 code implementation • 11 Oct 2022 • Yuxiang Nie, Heyan Huang, Wei Wei, Xian-Ling Mao
The proposed model mainly focuses on the evidence selection phase of long document question answering.
1 code implementation • COLING 2022 • Yuxiang Nie, Heyan Huang, Zewen Chi, Xian-Ling Mao
Previous works usually make use of heuristic rules as well as pre-trained models to construct data and train QA models.
2 code implementations • 25 Aug 2019 • Yong Hu, He-Yan Huang, Tian Lan, Xiaochi Wei, Yuxiang Nie, Jiarui Qi, Liner Yang, Xian-Ling Mao
Second language acquisition (SLA) modeling is to predict whether second language learners could correctly answer the questions according to what they have learned.