Search Results for author: Kaixiang Ji

Found 8 papers, 2 papers with code

Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

1 code implementation5 May 2025 Inclusion AI, Biao Gong, Cheng Zou, Dandan Zheng, Hu Yu, Jingdong Chen, Jianxin Sun, Junbo Zhao, Jun Zhou, Kaixiang Ji, Lixiang Ru, Libin Wang, Qingpei Guo, Rui Liu, Weilong Chai, Xinyu Xiao, Ziyuan Huang

We introduce Ming-Lite-Uni, an open-source multimodal framework featuring a newly designed unified visual generator and a native multimodal autoregressive model tailored for unifying vision and language.

multimodal interaction Text to Image Generation +1

Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping

no code implementations26 Mar 2025 Weili Zeng, Ziyuan Huang, Kaixiang Ji, Yichao Yan

Transformer-based models have driven significant advancements in Multimodal Large Language Models (MLLMs), yet their computational costs surge drastically when scaling resolution, training data, and model parameters.

Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight

no code implementations22 Jul 2024 Ziyuan Huang, Kaixiang Ji, Biao Gong, Zhiwu Qing, Qinglong Zhang, Kecheng Zheng, Jian Wang, Jingdong Chen, Ming Yang

This paper introduces Chain-of-Sight, a vision-language bridge module that accelerates the pre-training of Multimodal Large Language Models (MLLMs).

OrchMoE: Efficient Multi-Adapter Learning with Task-Skill Synergy

no code implementations19 Jan 2024 Haowen Wang, Tao Sun, Kaixiang Ji, Jian Wang, Cong Fan, Jinjie Gu

We advance the field of Parameter-Efficient Fine-Tuning (PEFT) with our novel multi-adapter method, OrchMoE, which capitalizes on modular skill architecture for enhanced forward transfer in neural networks.

Multi-Task Learning parameter-efficient fine-tuning

Towards Better Vision-Inspired Vision-Language Models

no code implementations CVPR 2024 Yun-Hao Cao, Kaixiang Ji, Ziyuan Huang, Chuanyang Zheng, Jiajia Liu, Jian Wang, Jingdong Chen, Ming Yang

In this paper we present a vision-inspired vision-language connection module dubbed as VIVL which efficiently exploits the vision cue for VL models.

Uncertainty-guided Learning for Improving Image Manipulation Detection

no code implementations ICCV 2023 Kaixiang Ji, Feng Chen, Xin Guo, Yadong Xu, Jian Wang, Jingdong Chen

Image manipulation detection (IMD) is of vital importance as faking images and spreading misinformation can be malicious and harm our daily life.

Image Manipulation Image Manipulation Detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.