1 code implementation • 29 May 2025 • Yuanxin Liu, Kun Ouyang, HaoNing Wu, Yi Liu, Lin Sui, Xinhao Li, Yan Zhong, Y. Charles, Xinyu Zhou, Xu sun
Recent studies have shown that long chain-of-thought (CoT) reasoning can significantly enhance the performance of large language models (LLMs) on complex tasks.
2 code implementations • 24 Apr 2025 • Linli Yao, Yicheng Li, Yuancheng Wei, Lei LI, Shuhuai Ren, Yuanxin Liu, Kun Ouyang, Lean Wang, Shicheng Li, Sida Li, Lingpeng Kong, Qi Liu, Yuanxing Zhang, Xu sun
Remarkably, our experiments demonstrate that DTD achieves an 82. 8% reduction in video tokens while maintaining 98% performance on StreamingBench, revealing that over 80% of visual content in streaming videos is naturally redundant without requiring language guidance.
1 code implementation • 10 Apr 2025 • Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, Congcong Wang, Dehao Zhang, Dikang Du, Dongliang Wang, Enming Yuan, Enzhe Lu, Fang Li, Flood Sung, Guangda Wei, Guokun Lai, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang, HaoNing Wu, Haotian Yao, Haoyu Lu, Heng Wang, Hongcheng Gao, Huabin Zheng, Jiaming Li, Jianlin Su, Jianzhou Wang, Jiaqi Deng, Jiezhong Qiu, Jin Xie, Jinhong Wang, Jingyuan Liu, Junjie Yan, Kun Ouyang, Liang Chen, Lin Sui, Longhui Yu, Mengfan Dong, Mengnan Dong, Nuo Xu, Pengyu Cheng, Qizheng Gu, Runjie Zhou, Shaowei Liu, Sihan Cao, Tao Yu, Tianhui Song, Tongtong Bai, Wei Song, Weiran He, Weixiao Huang, Weixin Xu, Xiaokun Yuan, Xingcheng Yao, Xingzhe Wu, Xinxing Zu, Xinyu Zhou, Xinyuan Wang, Y. Charles, Yan Zhong, Yang Li, Yangyang Hu, Yanru Chen, Yejie Wang, Yibo Liu, Yibo Miao, Yidao Qin, Yimin Chen, Yiping Bao, Yiqin Wang, Yongsheng Kang, Yuanxin Liu, Yulun Du, Yuxin Wu, Yuzhi Wang, Yuzi Yan, Zaida Zhou, Zhaowei Li, Zhejun Jiang, Zheng Zhang, Zhilin Yang, Zhiqi Huang, Zihao Huang, Zijia Zhao, Ziwei Chen, Zongyu Lin
We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2. 8B parameters in its language decoder (Kimi-VL-A3B).
2 code implementations • 2 Apr 2025 • Kun Ouyang, Yuanxin Liu, HaoNing Wu, Yi Liu, Hao Zhou, Jie zhou, Fandong Meng, Xu sun
Motivated by the success of Reinforcement Learning with Verifiable Reward (RLVR) in unlocking LLM reasoning abilities, this work aims to improve MLLMs in video spatial reasoning through the RLVR paradigm.
1 code implementation • 21 Mar 2025 • Shicheng Li, Lei LI, Kun Ouyang, Shuhuai Ren, Yuanxin Liu, Yuanxing Zhang, Fuzheng Zhang, Lingpeng Kong, Qi Liu, Xu sun
We further analyze the transferability of DPO data across architectures and the role of difficulty scheduling in optimization.
1 code implementation • 13 Mar 2025 • Yuanxin Liu, Rui Zhu, Shuhuai Ren, Jiacong Wang, Haoyuan Guo, Xu sun, Lu Jiang
To evaluate the performance of automatic metrics in unified AIGV evaluation, we introduce a benchmark called UVE-Bench.
no code implementations • 31 Jan 2025 • Han Zhong, Yutong Yin, Shenao Zhang, Xiaojun Xu, Yuanxin Liu, Yifei Zuo, Zhihan Liu, Boyi Liu, Sirui Zheng, Hongyi Guo, LiWei Wang, Mingyi Hong, Zhaoran Wang
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks, yet generating reliable reasoning processes remains a significant challenge.
no code implementations • 16 Dec 2024 • Kun Ouyang, Yuanxin Liu, Shicheng Li, Yi Liu, Hao Zhou, Fandong Meng, Jie zhou, Xu sun
To provide a comprehensive evaluation, PunchBench incorporates diverse question formats and image-captions from various domains.
no code implementations • 8 Oct 2024 • Lei LI, Yuanxin Liu, Linli Yao, Peiyuan Zhang, Chenxin An, Lean Wang, Xu sun, Lingpeng Kong, Qi Liu
Video Large Language Models (Video LLMs) have shown promising capabilities in video comprehension, yet they struggle with tracking temporal changes and reasoning about temporal relationships.
1 code implementation • 31 May 2024 • Linli Yao, Lei LI, Shuhuai Ren, Lean Wang, Yuanxin Liu, Xu sun, Lu Hou
Specifically, we trace back the semantic relevance flow from generated language tokens to raw visual encoder patches and the intermediate outputs produced by projectors.
1 code implementation • 28 Mar 2024 • Sishuo Chen, Lei LI, Shuhuai Ren, Rundong Gao, Yuanxin Liu, Xiaohan Bi, Xu sun, Lu Hou
Video paragraph captioning (VPC) involves generating detailed narratives for long videos, utilizing supportive modalities such as speech and event boundaries.
1 code implementation • 1 Mar 2024 • Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei LI, Sishuo Chen, Xu sun, Lu Hou
Motivated by these two problems, we propose the \textbf{TempCompass} benchmark, which introduces a diversity of temporal aspects and task formats.
1 code implementation • 29 Nov 2023 • Shicheng Li, Lei LI, Shuhuai Ren, Yuanxin Liu, Yi Liu, Rundong Gao, Xu sun, Lu Hou
The ability to perceive how objects change over time is a crucial ingredient in human intelligence.
1 code implementation • NeurIPS 2023 • Yuanxin Liu, Lei LI, Shuhuai Ren, Rundong Gao, Shicheng Li, Sishuo Chen, Xu sun, Lu Hou
The multi-aspect categorization of FETV enables fine-grained analysis of the metrics' reliability in different scenarios.
1 code implementation • 27 Oct 2022 • Bowen Shen, Zheng Lin, Yuanxin Liu, Zhengxiao Liu, Lei Wang, Weiping Wang
Motivated by such considerations, we propose a collaborative optimization for PLMs that integrates static model compression and dynamic inference acceleration.
1 code implementation • 26 Oct 2022 • Qingyi Si, Yuanxin Liu, Zheng Lin, Peng Fu, Weiping Wang
To this end, we systematically study the design of a training and compression pipeline to search the subnetworks, as well as the assignment of sparsity to different modality-specific modules.
1 code implementation • 11 Oct 2022 • Yuanxin Liu, Fandong Meng, Zheng Lin, Jiangnan Li, Peng Fu, Yanan Cao, Weiping Wang, Jie zhou
In response to the efficiency problem, recent studies show that dense PLMs can be replaced with sparse subnetworks without hurting the performance.
1 code implementation • 10 Oct 2022 • Qingyi Si, Yuanxin Liu, Fandong Meng, Zheng Lin, Peng Fu, Yanan Cao, Weiping Wang, Jie zhou
However, these models reveal a trade-off that the improvements on OOD data severely sacrifice the performance on the in-distribution (ID) data (which is dominated by the biased samples).
1 code implementation • 10 Oct 2022 • Qingyi Si, Fandong Meng, Mingyu Zheng, Zheng Lin, Yuanxin Liu, Peng Fu, Yanan Cao, Weiping Wang, Jie zhou
To overcome this limitation, we propose a new dataset that considers varying types of shortcuts by constructing different distribution shifts in multiple OOD test sets.
1 code implementation • NAACL 2022 • Yuanxin Liu, Fandong Meng, Zheng Lin, Peng Fu, Yanan Cao, Weiping Wang, Jie zhou
Firstly, we discover that the success of magnitude pruning can be attributed to the preserved pre-training performance, which correlates with the downstream transferability.
1 code implementation • ACL 2021 • Yuanxin Liu, Fandong Meng, Zheng Lin, Weiping Wang, Jie zhou
In this paper, however, we observe that although distilling the teacher's hidden state knowledge (HSK) is helpful, the performance gain (marginal utility) diminishes quickly as more HSK is distilled.
1 code implementation • 21 Mar 2021 • Yuanxin Liu, Zheng Lin, Fengcheng Yuan
Based on the empirical findings, our best compressed model, dubbed Refined BERT cOmpreSsion with InTegrAted techniques (ROSITA), is $7. 5 \times$ smaller than BERT while maintains $98. 5\%$ of the performance on five tasks of the GLUE benchmark, outperforming the previous BERT compression methods with similar parameter budget.
1 code implementation • 3 Dec 2020 • Qingyi Si, Yuanxin Liu, Peng Fu, Zheng Lin, Jiangnan Li, Weiping Wang
A critical problem behind these limitations is that the representations of unseen intents cannot be learned in the training stage.
no code implementations • 28 Feb 2020 • Fenglin Liu, Xuancheng Ren, Yuanxin Liu, Kai Lei, Xu sun
Recently, attention-based encoder-decoder models have been used extensively in image captioning.
no code implementations • 13 Nov 2019 • Yuanxin Liu, Zheng Lin
They are classified into architecture-based methods and strategy-based methods, based on their way of handling the above obstacle.
no code implementations • CONLL 2019 • Fenglin Liu, Meng Gao, Yuanxin Liu, Kai Lei
Residual has been widely applied to build deep neural networks with enhanced feature propagation and improved accuracy.
no code implementations • IJCNLP 2019 • Yanfu Xu, Zheng Lin, Yuanxin Liu, Rui Liu, Weiping Wang, Dan Meng
Open-domain question answering (OpenQA) aims to answer questions based on a number of unlabeled paragraphs.
1 code implementation • NeurIPS 2019 • Fenglin Liu, Yuanxin Liu, Xuancheng Ren, Xiaodong He, Xu sun
In vision-and-language grounding problems, fine-grained representations of the image are considered to be of paramount importance.
1 code implementation • EMNLP 2018 • Fenglin Liu, Xuancheng Ren, Yuanxin Liu, Houfeng Wang, Xu sun
The encode-decoder framework has shown recent success in image captioning.