Search Results for author: Qingyi Si

Found 22 papers, 18 papers with code

S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models

no code implementations12 May 2025 Muzhi Dai, Chenxu Yang, Qingyi Si

As Test-Time Scaling emerges as an active research focus in the large language model community, advanced post-training methods increasingly emphasize extending chain-of-thought (CoT) generation length, thereby enhancing reasoning capabilities to approach Deepseek R1-like reasoning models.

GSM8K Large Language Model +3

Dynamic Early Exit in Reasoning Models

1 code implementation22 Apr 2025 Chenxu Yang, Qingyi Si, Yongjie Duan, Zheliang Zhu, Chenyu Zhu, Qiaowei Li, Zheng Lin, Li Cao, Weiping Wang

Recent advances in large reasoning language models (LRLMs) rely on test-time scaling, which extends long chain-of-thought (CoT) generation to solve complex tasks.

GSM8K Math

AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding

1 code implementation16 Mar 2025 Xiao Wang, Qingyi Si, Jianlong Wu, Shiyu Zhu, Li Cao, Liqiang Nie

Multimodal Large Language Models (MLLMs) have revolutionized video understanding, yet are still limited by context length when processing long videos.

Video Understanding

ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding

1 code implementation29 Dec 2024 Xiao Wang, Qingyi Si, Jianlong Wu, Shiyu Zhu, Li Cao, Liqiang Nie

Video Large Language Models (VideoLLMs) have made significant strides in video understanding but struggle with long videos due to the limitations of their backbone LLMs.

Video Compression Video Understanding

Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering

1 code implementation19 Dec 2024 Peize Li, Qingyi Si, Peng Fu, Zheng Lin, Yan Wang

Retrieval-based multi-image question answering (QA) task involves retrieving multiple question-related images and synthesizing these images to generate an answer.

Contrastive Learning Language Modeling +6

Towards Flexible Evaluation for Generative Visual Question Answering

1 code implementation1 Aug 2024 Huishan Ji, Qingyi Si, Zheng Lin, Weiping Wang

Throughout rapid development of multimodal large language models, a crucial ingredient is a fair and accurate evaluation of their multimodal comprehension abilities.

Decoder Generative Visual Question Answering +4

Multimodal Table Understanding

1 code implementation12 Jun 2024 Mingyu Zheng, Xinwei Feng, Qingyi Si, Qiaoqiao She, Zheng Lin, Wenbin Jiang, Weiping Wang

Although great progress has been made by previous table understanding methods including recent approaches based on large language models (LLMs), they rely heavily on the premise that given tables must be converted into a certain text sequence (such as Markdown or HTML) to serve as model input.

Language Modeling Language Modelling +2

Think out Loud: Emotion Deducing Explanation in Dialogues

no code implementations7 Jun 2024 Jiangnan Li, Zheng Lin, Lanrui Wang, Qingyi Si, Yanan Cao, Mo Yu, Peng Fu, Weiping Wang, Jie zhou

Besides, EDEN can help LLMs achieve better recognition of emotions and causes, which explores a new research direction of explainable emotion understanding in dialogues.

Common Sense Reasoning Emotion Cause Extraction

Are Large Language Models Table-based Fact-Checkers?

1 code implementation4 Feb 2024 Hanwen Zhang, Qingyi Si, Peng Fu, Zheng Lin, Weiping Wang

Finally, we analyze some possible directions to promote the accuracy of TFV via LLMs, which is beneficial to further research of table reasoning.

Fact Verification In-Context Learning +2

Object Attribute Matters in Visual Question Answering

1 code implementation20 Dec 2023 Peize Li, Qingyi Si, Peng Fu, Zheng Lin, Yan Wang

In this paper, we propose a novel VQA approach from the perspective of utilizing object attribute, aiming to achieve better object-level visual-language alignment and multimodal scene understanding.

Attribute Graph Neural Network +6

An Empirical Study of Instruction-tuning Large Language Models in Chinese

1 code implementation11 Oct 2023 Qingyi Si, Tong Wang, Zheng Lin, Xu Zhang, Yanan Cao, Weiping Wang

This paper will release a powerful Chinese LLMs that is comparable to ChatGLM.

Combo of Thinking and Observing for Outside-Knowledge VQA

1 code implementation10 May 2023 Qingyi Si, Yuchen Mo, Zheng Lin, Huishan Ji, Weiping Wang

Some existing solutions draw external knowledge into the cross-modality space which overlooks the much vaster textual knowledge in natural-language space, while others transform the image into a text that further fuses with the textual knowledge into the natural-language space and completely abandons the use of visual features.

Decoder Question Answering +2

Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering

1 code implementation26 Oct 2022 Qingyi Si, Yuanxin Liu, Zheng Lin, Peng Fu, Weiping Wang

To this end, we systematically study the design of a training and compression pipeline to search the subnetworks, as well as the assignment of sparsity to different modality-specific modules.

Question Answering Visual Question Answering

Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA

1 code implementation10 Oct 2022 Qingyi Si, Fandong Meng, Mingyu Zheng, Zheng Lin, Yuanxin Liu, Peng Fu, Yanan Cao, Weiping Wang, Jie zhou

To overcome this limitation, we propose a new dataset that considers varying types of shortcuts by constructing different distribution shifts in multiple OOD test sets.

Question Answering Visual Question Answering

Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning

1 code implementation10 Oct 2022 Qingyi Si, Yuanxin Liu, Fandong Meng, Zheng Lin, Peng Fu, Yanan Cao, Weiping Wang, Jie zhou

However, these models reveal a trade-off that the improvements on OOD data severely sacrifice the performance on the in-distribution (ID) data (which is dominated by the biased samples).

Contrastive Learning Question Answering +1

Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene

1 code implementation16 Mar 2022 Duo Zheng, Fandong Meng, Qingyi Si, Hairun Fan, Zipeng Xu, Jie zhou, Fangxiang Feng, Xiaojie Wang

Visual dialog has witnessed great progress after introducing various vision-oriented goals into the conversation, especially such as GuessWhich and GuessWhat, where the only image is visible by either and both of the questioner and the answerer, respectively.

Visual Dialog

Check It Again: Progressive Visual Question Answering via Visual Entailment

1 code implementation8 Jun 2021 Qingyi Si, Zheng Lin, Mingyu Zheng, Peng Fu, Weiping Wang

Besides, they only explore the interaction between image and question, ignoring the semantics of candidate answers.

Question Answering Visual Entailment +1

A Hierarchical Transformer with Speaker Modeling for Emotion Recognition in Conversation

1 code implementation29 Dec 2020 Jiangnan Li, Zheng Lin, Peng Fu, Qingyi Si, Weiping Wang

It can be regarded as a personalized and interactive emotion recognition task, which is supposed to consider not only the semantic information of text but also the influences from speakers.

Emotion Recognition in Conversation

Learning Class-Transductive Intent Representations for Zero-shot Intent Detection

1 code implementation3 Dec 2020 Qingyi Si, Yuanxin Liu, Peng Fu, Zheng Lin, Jiangnan Li, Weiping Wang

A critical problem behind these limitations is that the representations of unseen intents cannot be learned in the training stage.

Intent Detection Multi-Task Learning +1

Multi-Perspective Fusion Network for Commonsense Reading Comprehension

no code implementations8 Jan 2019 Chunhua Liu, Yan Zhao, Qingyi Si, Haiou Zhang, Bohan Li, Dong Yu

From the experimental results, we can conclude that the difference fusion is comparable with union fusion, and the similarity fusion needs to be activated by the union fusion.

Reading Comprehension

Cannot find the paper you are looking for? You can Submit a new open access paper.