no code implementations • 12 May 2025 • Muzhi Dai, Chenxu Yang, Qingyi Si
As Test-Time Scaling emerges as an active research focus in the large language model community, advanced post-training methods increasingly emphasize extending chain-of-thought (CoT) generation length, thereby enhancing reasoning capabilities to approach Deepseek R1-like reasoning models.
1 code implementation • 22 Apr 2025 • Chenxu Yang, Qingyi Si, Yongjie Duan, Zheliang Zhu, Chenyu Zhu, Qiaowei Li, Zheng Lin, Li Cao, Weiping Wang
Recent advances in large reasoning language models (LRLMs) rely on test-time scaling, which extends long chain-of-thought (CoT) generation to solve complex tasks.
1 code implementation • 16 Mar 2025 • Xiao Wang, Qingyi Si, Jianlong Wu, Shiyu Zhu, Li Cao, Liqiang Nie
Multimodal Large Language Models (MLLMs) have revolutionized video understanding, yet are still limited by context length when processing long videos.
1 code implementation • 29 Dec 2024 • Xiao Wang, Qingyi Si, Jianlong Wu, Shiyu Zhu, Li Cao, Liqiang Nie
Video Large Language Models (VideoLLMs) have made significant strides in video understanding but struggle with long videos due to the limitations of their backbone LLMs.
1 code implementation • 19 Dec 2024 • Peize Li, Qingyi Si, Peng Fu, Zheng Lin, Yan Wang
Retrieval-based multi-image question answering (QA) task involves retrieving multiple question-related images and synthesizing these images to generate an answer.
no code implementations • 4 Nov 2024 • Siyuan Chen, Qingyi Si, Chenxu Yang, Yunzhi Liang, Zheng Lin, Huan Liu, Weiping Wang
The advent of large language models (LLMs) has significantly propelled the advancement of Role-Playing Agents (RPAs).
1 code implementation • 1 Aug 2024 • Huishan Ji, Qingyi Si, Zheng Lin, Weiping Wang
Throughout rapid development of multimodal large language models, a crucial ingredient is a fair and accurate evaluation of their multimodal comprehension abilities.
1 code implementation • 12 Jun 2024 • Mingyu Zheng, Xinwei Feng, Qingyi Si, Qiaoqiao She, Zheng Lin, Wenbin Jiang, Weiping Wang
Although great progress has been made by previous table understanding methods including recent approaches based on large language models (LLMs), they rely heavily on the premise that given tables must be converted into a certain text sequence (such as Markdown or HTML) to serve as model input.
no code implementations • 7 Jun 2024 • Jiangnan Li, Zheng Lin, Lanrui Wang, Qingyi Si, Yanan Cao, Mo Yu, Peng Fu, Weiping Wang, Jie zhou
Besides, EDEN can help LLMs achieve better recognition of emotions and causes, which explores a new research direction of explainable emotion understanding in dialogues.
1 code implementation • 4 Feb 2024 • Hanwen Zhang, Qingyi Si, Peng Fu, Zheng Lin, Weiping Wang
Finally, we analyze some possible directions to promote the accuracy of TFV via LLMs, which is beneficial to further research of table reasoning.
1 code implementation • 20 Dec 2023 • Peize Li, Qingyi Si, Peng Fu, Zheng Lin, Yan Wang
In this paper, we propose a novel VQA approach from the perspective of utilizing object attribute, aiming to achieve better object-level visual-language alignment and multimodal scene understanding.
1 code implementation • 11 Oct 2023 • Qingyi Si, Tong Wang, Zheng Lin, Xu Zhang, Yanan Cao, Weiping Wang
This paper will release a powerful Chinese LLMs that is comparable to ChatGLM.
1 code implementation • 10 May 2023 • Qingyi Si, Yuchen Mo, Zheng Lin, Huishan Ji, Weiping Wang
Some existing solutions draw external knowledge into the cross-modality space which overlooks the much vaster textual knowledge in natural-language space, while others transform the image into a text that further fuses with the textual knowledge into the natural-language space and completely abandons the use of visual features.
1 code implementation • 26 Oct 2022 • Qingyi Si, Yuanxin Liu, Zheng Lin, Peng Fu, Weiping Wang
To this end, we systematically study the design of a training and compression pipeline to search the subnetworks, as well as the assignment of sparsity to different modality-specific modules.
1 code implementation • 10 Oct 2022 • Qingyi Si, Fandong Meng, Mingyu Zheng, Zheng Lin, Yuanxin Liu, Peng Fu, Yanan Cao, Weiping Wang, Jie zhou
To overcome this limitation, we propose a new dataset that considers varying types of shortcuts by constructing different distribution shifts in multiple OOD test sets.
1 code implementation • 10 Oct 2022 • Qingyi Si, Yuanxin Liu, Fandong Meng, Zheng Lin, Peng Fu, Yanan Cao, Weiping Wang, Jie zhou
However, these models reveal a trade-off that the improvements on OOD data severely sacrifice the performance on the in-distribution (ID) data (which is dominated by the biased samples).
1 code implementation • 16 Mar 2022 • Duo Zheng, Fandong Meng, Qingyi Si, Hairun Fan, Zipeng Xu, Jie zhou, Fangxiang Feng, Xiaojie Wang
Visual dialog has witnessed great progress after introducing various vision-oriented goals into the conversation, especially such as GuessWhich and GuessWhat, where the only image is visible by either and both of the questioner and the answerer, respectively.
1 code implementation • ACL 2021 • Qingyi Si, Zheng Lin, Ming yu Zheng, Peng Fu, Weiping Wang
Besides, they only explore the interaction between image and question, ignoring the semantics of candidate answers.
1 code implementation • 8 Jun 2021 • Qingyi Si, Zheng Lin, Mingyu Zheng, Peng Fu, Weiping Wang
Besides, they only explore the interaction between image and question, ignoring the semantics of candidate answers.
1 code implementation • 29 Dec 2020 • Jiangnan Li, Zheng Lin, Peng Fu, Qingyi Si, Weiping Wang
It can be regarded as a personalized and interactive emotion recognition task, which is supposed to consider not only the semantic information of text but also the influences from speakers.
Ranked #39 on
Emotion Recognition in Conversation
on IEMOCAP
1 code implementation • 3 Dec 2020 • Qingyi Si, Yuanxin Liu, Peng Fu, Zheng Lin, Jiangnan Li, Weiping Wang
A critical problem behind these limitations is that the representations of unseen intents cannot be learned in the training stage.
no code implementations • 8 Jan 2019 • Chunhua Liu, Yan Zhao, Qingyi Si, Haiou Zhang, Bohan Li, Dong Yu
From the experimental results, we can conclude that the difference fusion is comparable with union fusion, and the similarity fusion needs to be activated by the union fusion.