no code implementations • 12 Mar 2025 • Xiaoda Yang, Junyu Lu, Hongshun Qiu, Sijing Li, Hao Li, Shengpeng Ji, Xudong Tang, Jiayang Xu, Jiaqi Duan, Ziyue Jiang, Cong Lin, Sihang Cai, Zejian Xie, Zhuoyang Song, Songxin Zhang
Vision-Language Models (VLMs) based on Mixture-of-Experts (MoE) architectures have emerged as a pivotal paradigm in multimodal understanding, offering a powerful framework for integrating visual and linguistic information.
no code implementations • 9 Oct 2024 • Hengxiang Zhang, Songxin Zhang, BingYi Jing, Hongxin Wei
In light of this, we introduce a novel and effective method termed Fine-tuned Score Deviation (FSD), which improves the performance of current scoring functions for pretraining data detection.
no code implementations • 8 Feb 2024 • Wenyu Jiang, Zhenlong Liu, Zejian Xie, Songxin Zhang, BingYi Jing, Hongxin Wei
In this paper, we propose a straightforward, novel, and training-free hardness score named Distorting-based Learning Complexity (DLC), to identify informative images and instructions from the downstream dataset efficiently.
no code implementations • 8 Dec 2023 • Junyu Lu, Dixiang Zhang, Songxin Zhang, Zejian Xie, Zhuoyang Song, Cong Lin, Jiaxing Zhang, BingYi Jing, Pingjian Zhang
During the instruction fine-tuning stage, we introduce semantic-aware visual feature extraction, a crucial method that enables the model to extract informative features from concrete visual objects.
Ranked #1 on
Image Captioning
on nocaps entire