no code implementations • 4 Jun 2025 • Shuang Chen, Yue Guo, Zhaochen Su, Yafu Li, Yulun Wu, Jiacheng Chen, Jiayu Chen, Weijie Wang, Xiaoye Qu, Yu Cheng
Inspired by the remarkable reasoning capabilities of Deepseek-R1 in complex textual tasks, many works attempt to incentivize similar capabilities in Multimodal Large Language Models (MLLMs) by directly applying reinforcement learning (RL).
1 code implementation • 26 May 2025 • Zican Hu, Wei Liu, Xiaoye Qu, Xiangyu Yue, Chunlin Chen, Zhi Wang, Yu Cheng
While showing sophisticated reasoning abilities, large language models (LLMs) still struggle with long-horizon decision-making tasks due to deficient exploration and long-term credit assignment, especially in sparse-reward scenarios.
1 code implementation • 26 May 2025 • Yifan Jia, Kailin Jiang, Yuyang Liang, Qihan Ren, Yi Xin, Rui Yang, Fenze Feng, Mingcai Chen, Hengyang Lu, Haozhe Wang, Xiaoye Qu, Dongrui Liu, Lizhen Cui, Yuntao Du
Large Multimodal Models(LMMs) face notable challenges when encountering multimodal knowledge conflicts, particularly under retrieval-augmented generation(RAG) frameworks where the contextual information from external sources may contradict the model's internal parametric knowledge, leading to unreliable outputs.
1 code implementation • 25 May 2025 • Chuming Shen, Wei Wei, Xiaoye Qu, Yu Cheng
Our analysis of the attention map confirms enhanced focus on critical regions, which brings improvements in accuracy.
1 code implementation • 25 May 2025 • Xinyao Liao, Wei Wei, Xiaoye Qu, Yu Cheng
Recent advances in text-to-image (T2I) diffusion model fine-tuning leverage reinforcement learning (RL) to align generated images with learnable reward functions.
1 code implementation • 20 May 2025 • Tingchen Fu, Jiawei Gu, Yafu Li, Xiaoye Qu, Yu Cheng
Instruction-following is essential for aligning large language models (LLMs) with user intent.
1 code implementation • 13 May 2025 • Zhaochen Su, Linjie Li, Mingyang Song, Yunzhuo Hao, Zhengyuan Yang, Jun Zhang, Guanjie Chen, Jiawei Gu, Juntao Li, Xiaoye Qu, Yu Cheng
We hope OpenThinkIMG can serve as a foundational framework for advancing dynamic, tool-augmented visual reasoning, helping the community develop AI agents that can genuinely "think with images".
1 code implementation • 21 Apr 2025 • Jianhao Yan, Yafu Li, Zican Hu, Zhi Wang, Ganqu Cui, Xiaoye Qu, Yu Cheng, Yue Zhang
Recent advances in large reasoning models (LRMs) demonstrate that sophisticated behaviors such as multi-step reasoning and self-reflection can emerge via reinforcement learning with verifiable rewards~(\textit{RLVR}).
1 code implementation • 9 Apr 2025 • Zhilin Wang, Yafu Li, Xiaoye Qu, Yu Cheng
Some approaches use routers to assign tasks to experts, but in continual learning, they often require retraining for optimal performance.
1 code implementation • 27 Mar 2025 • Xiaoye Qu, Yafu Li, Zhaochen Su, Weigao Sun, Jianhao Yan, Dongrui Liu, Ganqu Cui, Daizong Liu, Shuxian Liang, Junxian He, Peng Li, Wei Wei, Jing Shao, Chaochao Lu, Yue Zhang, Xian-Sheng Hua, BoWen Zhou, Yu Cheng
Recent Large Reasoning Models (LRMs), such as DeepSeek-R1 and OpenAI o1, have demonstrated strong performance gains by scaling up the length of Chain-of-Thought (CoT) reasoning during inference.
no code implementations • CVPR 2025 • Mingyang Song, Xiaoye Qu, Jiawei Zhou, Yu Cheng
Despite this success, the training data of LVLMs still suffers from Long-Tail (LT) problems, where the data distribution is highly imbalanced.
1 code implementation • 7 Mar 2025 • Weigao Sun, Disen Lan, Tong Zhu, Xiaoye Qu, Yu Cheng
Linear-MoE leverages the advantages of both LSM modules for linear-complexity sequence modeling and MoE layers for sparsely activation, aiming to offer high performance with efficient training.
1 code implementation • CVPR 2025 • Jie Tian, Xiaoye Qu, Zhenyi Lu, Wei Wei, Sichen Liu, Yu Cheng
(3) With the above two-stage models excelling in motion controllability and degree, we decouple the relevant parameters associated with each type of motion ability and inject them into the base I2V-DM.
1 code implementation • 24 Feb 2025 • Chenghao Fan, Zhenyi Lu, Sichen Liu, Chengfeng Gu, Xiaoye Qu, Wei Wei, Yu Cheng
While Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning for Large Language Models (LLMs), its performance often falls short of Full Fine-Tuning (Full FT).
1 code implementation • 11 Feb 2025 • Weigao Sun, Disen Lan, Yiran Zhong, Xiaoye Qu, Yu Cheng
In this paper, we introduce LASP-2, a new SP method to enhance both communication and computation parallelism when training linear attention transformer models with very-long input sequences.
1 code implementation • 22 Jan 2025 • Yafu Li, Xuyang Hu, Xiaoye Qu, Linjie Li, Yu Cheng
In this work, we introduce Test-time Preference Optimization (TPO), a framework that aligns LLM outputs with human preferences during inference, removing the need to update model parameters.
1 code implementation • 6 Jan 2025 • Mingyang Song, Zhaochen Su, Xiaoye Qu, Jiawei Zhou, Yu Cheng
Since language models are prone to various types of errors during the reasoning process, PRMs are required to possess nuanced capabilities for detecting various implicit error types in real-world scenarios.
1 code implementation • 26 Nov 2024 • Guanjie Chen, Xinyu Zhao, Yucheng Zhou, Xiaoye Qu, Tianlong Chen, Yu Cheng
Diffusion Transformers (DiT) have emerged as a powerful architecture for image and video generation, offering superior quality and scalability.
1 code implementation • 24 Nov 2024 • Xiaoye Qu, Daize Dong, Xuyang Hu, Tong Zhu, Weigao Sun, Yu Cheng
Recently, inspired by the concept of sparsity, Mixture-of-Experts (MoE) models have gained increasing popularity for scaling model size while keeping the number of activated parameters constant.
1 code implementation • 28 Sep 2024 • Jihai Zhang, Xiaoye Qu, Tong Zhu, Yu Cheng
In recent years, Contrastive Language-Image Pre-training (CLIP) has become a cornerstone in multimodal intelligence.
1 code implementation • 21 Sep 2024 • Jiashuo Sun, Jihai Zhang, Yucheng Zhou, Zhaochen Su, Xiaoye Qu, Yu Cheng
To address these challenges, we propose a self-refinement framework designed to teach LVLMs to Selectively Utilize Retrieved Information (SURf).
1 code implementation • 30 Aug 2024 • Xiaoye Qu, Jiashuo Sun, Wei Wei, Yu Cheng
By fully grasping the information in the image and carefully considering the certainty of the potential answers when decoding, our MVP can effectively reduce hallucinations in LVLMs. The extensive experiments verify that our proposed MVP significantly mitigates the hallucination problem across four well-known LVLMs.
1 code implementation • 22 Aug 2024 • Zhaochen Su, Jun Zhang, Xiaoye Qu, Tong Zhu, Yanshu Li, Jiashuo Sun, Juntao Li, Min Zhang, Yu Cheng
Only a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge.
1 code implementation • 1 Aug 2024 • Xiaoye Qu, Mingyang Song, Wei Wei, Jianfeng Dong, Yu Cheng
In this paper, we make the first attempt to mitigate this important multilingual hallucination in LVLMs.
no code implementations • 1 Aug 2024 • Xiaoye Qu, Qiyuan Chen, Wei Wei, Jishuo Sun, Jianfeng Dong
To assess the capability of our proposed ARA model in reducing hallucination, we employ three widely used LVLM models (LLaVA-1. 5, Qwen-VL, and mPLUG-Owl2) across four benchmarks.
1 code implementation • 10 Jul 2024 • Daizong Liu, Mingyu Yang, Xiaoye Qu, Pan Zhou, Yu Cheng, Wei Hu
Compared to traditional Large Language Models (LLMs), LVLMs present great potential and challenges due to its closer proximity to the multi-resource real-world applications and the complexity of multi-modal processing.
2 code implementations • 24 Jun 2024 • Tong Zhu, Xiaoye Qu, Daize Dong, Jiacheng Ruan, Jingqi Tong, Conghui He, Yu Cheng
Motivated by this limit, we investigate building MoE models from existing dense large language models.
1 code implementation • 20 Jun 2024 • Zhaochen Su, Jun Zhang, Tong Zhu, Xiaoye Qu, Juntao Li, Min Zhang, Yu Cheng
Therefore, we propose a crucial question: Can we build a universal framework to handle a variety of temporal reasoning tasks?
1 code implementation • 17 Jun 2024 • Tong Zhu, Daize Dong, Xiaoye Qu, Jiacheng Ruan, Wenliang Chen, Yu Cheng
Mixture-of-Experts (MoE) models have shown remarkable capability in instruction tuning, especially when the number of tasks scales.
no code implementations • 17 Jun 2024 • Chenghao Fan, Zhenyi Lu, Wei Wei, Jie Tian, Xiaoye Qu, Dangyang Chen, Yu Cheng
\thm{Can we fine-tune a series of task-specific small models and transfer their knowledge directly to a much larger model without additional training?}
1 code implementation • 17 Jun 2024 • Zhenyi Lu, Chenghao Fan, Wei Wei, Xiaoye Qu, Dangyang Chen, Yu Cheng
In view of this, we propose Twin-Merging, a method that encompasses two principal stages: (1) modularizing knowledge into shared and exclusive components, with compression to reduce redundancy and enhance efficiency; (2) dynamically merging shared and task-specific knowledge based on the input.
1 code implementation • 13 Jun 2024 • Zhaochen Su, Juntao Li, Jun Zhang, Tong Zhu, Xiaoye Qu, Pan Zhou, Yan Bowen, Yu Cheng, Min Zhang
Temporal reasoning is fundamental for large language models (LLMs) to comprehend the world.
1 code implementation • 11 Jun 2024 • Zhenyi Lu, Jie Tian, Wei Wei, Xiaoye Qu, Yu Cheng, Wenfeng Xie, Dangyang Chen
Our approach is grounded in the empirical observation that pairwise comparisons can effectively alleviate boundary ambiguity and inherent bias.
1 code implementation • 3 Jun 2024 • Zhuojun Ding, Wei Wei, Xiaoye Qu, Dangyang Chen
In this paper, we propose a Global-Local Denoising framework (GLoDe) for cross-lingual NER.
no code implementations • 5 Mar 2024 • Dong Yao, Asaad Alghamdi, Qingrong Xia, Xiaoye Qu, Xinyu Duan, Zhefeng Wang, Yi Zheng, Baoxing Huai, Peilun Cheng, Zhou Zhao
Although DC-Match is a simple yet effective method for semantic matching, it highly depends on the external NER techniques to identify the keywords of sentences, which limits the performance of semantic matching for minor languages since satisfactory NER tools are usually hard to obtain.
1 code implementation • 19 Feb 2024 • Jihai Zhang, Xiang Lan, Xiaoye Qu, Yu Cheng, Mengling Feng, Bryan Hooi
Self-Supervised Contrastive Learning has proven effective in deriving high-quality representations from unlabeled data.
no code implementations • 4 Jan 2024 • Rikui Huang, Wei Wei, Xiaoye Qu, Wenfeng Xie, Xianling Mao, Dangyang Chen
Temporal Knowledge Graph (TKG) is an extension of regular knowledge graph by attaching the time scope.
1 code implementation • 26 Dec 2023 • Chenghao Fan, Wei Wei, Xiaoye Qu, Zhenyi Lu, Wenfeng Xie, Yu Cheng, Dangyang Chen
Recently, prompt-tuning with pre-trained language models (PLMs) has demonstrated the significantly enhancing ability of relation extraction (RE) tasks.
1 code implementation • 9 Nov 2023 • Tong Zhu, Junfei Ren, Zijian Yu, Mengsong Wu, Guoliang Zhang, Xiaoye Qu, Wenliang Chen, Zhefeng Wang, Baoxing Huai, Min Zhang
Sharing knowledge between information extraction tasks has always been a challenge due to the diverse data formats and task variations.
1 code implementation • 6 Nov 2023 • Shengkai Sun, Daizong Liu, Jianfeng Dong, Xiaoye Qu, Junyu Gao, Xun Yang, Xun Wang, Meng Wang
In this manner, our framework is able to learn the unified representations of uni-modal or multi-modal skeleton input, which is flexible to different kinds of modality input for robust action understanding in practical cases.
1 code implementation • 22 Oct 2023 • Zhenyi Lu, Wei Wei, Xiaoye Qu, Xianling Mao, Dangyang Chen, Jixiong Chen
Subsequently, we employ a conditional variational auto-encoder to align with the dense personalized responses within a latent joint attribute space.
1 code implementation • 20 Jul 2023 • Wendi Li, Wei Wei, Xiaoye Qu, Xian-Ling Mao, Ye Yuan, Wenfeng Xie, Dangyang Chen
TREA constructs a multi-hierarchical scalable tree as the reasoning structure to clarify the causal relationships between mentioned entities, and fully utilizes historical conversations to generate more reasonable and suitable responses for recommended results.
1 code implementation • 17 May 2023 • Jianfeng Dong, Xiaoman Peng, Zhe Ma, Daizong Liu, Xiaoye Qu, Xun Yang, Jixiang Zhu, Baolong Liu
As the attribute-specific similarity typically corresponds to the specific subtle regions of images, we propose a Region-to-Patch Framework (RPF) that consists of a region-aware branch and a patch-aware branch to extract fine-grained attribute-related visual features for precise retrieval in a coarse-to-fine manner.
no code implementations • 6 May 2023 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Zichuan Xu, Haozhao Wang, Xing Di, Weining Lu, Yu Cheng
This paper addresses the temporal sentence grounding (TSG).
no code implementations • 7 Feb 2023 • Xiaoye Qu, Yingjie Gu, Qingrong Xia, Zechang Li, Zhefeng Wang, Baoxing Huai
In this paper, we provide a comprehensive review of the development of Arabic NER, especially the recent advances in deep learning and pre-trained language model.
1 code implementation • ICCV 2023 • Jianfeng Dong, Minsong Zhang, Zheng Zhang, Xianke Chen, Daizong Liu, Xiaoye Qu, Xun Wang, Baolong Liu
During the knowledge distillation, an inheritance student branch is devised to absorb the knowledge from the teacher model.
1 code implementation • 13 Dec 2022 • Xiaoye Qu, Jun Zeng, Daizong Liu, Zhefeng Wang, Baoxing Huai, Pan Zhou
Distantly-Supervised Named Entity Recognition (DS-NER) effectively alleviates the data scarcity problem in NER by automatically generating training samples.
no code implementations • 27 Jul 2022 • Daizong Liu, Xiaoye Qu, Wei Hu
In this paper, we study the above issue of selection biases and accordingly propose a Debiasing-TSG (D-TSG) model to filter and remove the negative biases in both vision and language modalities for enhancing the model generalization ability.
no code implementations • Findings (NAACL) 2022 • Yingjie Gu, Xiaoye Qu, Zhefeng Wang, Yi Zheng, Baoxing Huai, Nicholas Jing Yuan
Recent years have witnessed the improving performance of Chinese Named Entity Recognition (NER) from proposing new frameworks or incorporating word lexicons.
Chinese Named Entity Recognition
named-entity-recognition
+3
1 code implementation • 23 Jan 2022 • Jianfeng Dong, Yabing Wang, Xianke Chen, Xiaoye Qu, Xirong Li, Yuan He, Xun Wang
In this work, we concentrate on video representation learning, an essential component for text-to-video retrieval.
no code implementations • 14 Jan 2022 • Daizong Liu, Xiaoye Qu, Yinzhen Wang, Xing Di, Kai Zou, Yu Cheng, Zichuan Xu, Pan Zhou
Temporal video grounding (TVG) aims to localize a target segment in a video according to a given sentence query.
no code implementations • 3 Jan 2022 • Daizong Liu, Xiaoye Qu, Pan Zhou, Yang Liu
Then, we develop separate motion and appearance branches to learn motion-guided and appearance-guided object relations, respectively.
no code implementations • 3 Jan 2022 • Daizong Liu, Xiaoye Qu, Xing Di, Yu Cheng, Zichuan Xu, Pan Zhou
To tackle this issue, we propose a memory-augmented network, called Memory-Guided Semantic Learning Network (MGSL-Net), that learns and memorizes the rarely appeared content in TSG tasks.
1 code implementation • 11 Dec 2021 • Tong Zhu, Xiaoye Qu, Wenliang Chen, Zhefeng Wang, Baoxing Huai, Nicholas Jing Yuan, Min Zhang
Most previous studies of document-level event extraction mainly focus on building argument chains in an autoregressive way, which achieves a certain success but is inefficient in both training and inference.
Ranked #3 on
Document-level Event Extraction
on ChFinAnn
no code implementations • EMNLP 2021 • Daizong Liu, Xiaoye Qu, Pan Zhou
A key solution to temporal sentence grounding (TSG) exists in how to learn effective alignment between vision and language features extracted from an untrimmed video and a sentence description.
no code implementations • EMNLP 2021 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou
However, the performance of bottom-up model is inferior to the top-down counterpart as it fails to exploit the segment-level interaction.
no code implementations • 27 Jul 2021 • Zhikang Zou, Xiaoye Qu, Pan Zhou, Shuangjie Xu, Xiaoqing Ye, Wenhao Wu, Jin Ye
In specific, at the coarse-grained stage, we design a dual-discriminator strategy to adapt source domain to be close to the targets from the perspectives of both global and local feature space via adversarial learning.
no code implementations • CVPR 2021 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Yu Cheng, Wei Wei, Zichuan Xu, Yulai Xie
This paper addresses the problem of temporal sentence grounding (TSG), which aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query.
1 code implementation • 18 Feb 2021 • Zhe Ma, Fenghao Liu, Jianfeng Dong, Xiaoye Qu, Yuan He, Shouling Ji
In this paper, we focus on the cross-modal similarity measurement, and propose a novel Hierarchical Similarity Learning (HSL) network.
no code implementations • 2 Feb 2021 • Qi Zheng, Jianfeng Dong, Xiaoye Qu, Xun Yang, Yabing Wang, Pan Zhou, Baolong Liu, Xun Wang
The language-based setting of this task allows for an open set of target activities, resulting in a large variation of the temporal lengths of video moments.
no code implementations • 7 Jan 2021 • Yingjie Gu, Xiaoye Qu, Zhefeng Wang, Baoxing Huai, Nicholas Jing Yuan, Xiaolin Gui
Entity linking (EL) for the rapidly growing short text (e. g. search queries and news titles) is critical to industrial applications.
no code implementations • COLING 2020 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou
In this paper, we propose a novel deep rectification-modulation network (RMN), transforming this task into a multi-step reasoning process by repeating rectification and modulation.
no code implementations • 6 Aug 2020 • Xiaoye Qu, Pengwei Tang, Zhikang Zhou, Yu Cheng, Jianfeng Dong, Pan Zhou
In this paper, we propose a Fine-grained Iterative Attention Network (FIAN) that consists of an iterative attention module for bilateral query-video in-formation extraction.
1 code implementation • 4 Aug 2020 • Daizong Liu, Xiaoye Qu, Xiao-Yang Liu, Jianfeng Dong, Pan Zhou, Zichuan Xu
To this end, we propose a novel Cross- and Self-Modal Graph Attention Network (CSMGAN) that recasts this task as a process of iterative messages passing over a joint graph.
no code implementations • 12 Aug 2019 • Zhikang Zou, Huiliang Shao, Xiaoye Qu, Wei Wei, Pan Zhou
Recently, convolutional neural networks (CNNs) are the leading defacto method for crowd counting.
no code implementations • 7 Aug 2019 • Zhikang Zou, Yu Cheng, Xiaoye Qu, Shouling Ji, Xiaoxiao Guo, Pan Zhou
ACM-CNN consists of three types of modules: a coarse network, a fine network, and a smooth network.
no code implementations • NAACL 2019 • Xiaoye Qu, Zhikang Zou, Yu Cheng, Yang Yang, Pan Zhou
Cross-domain sentiment classification aims to predict sentiment polarity on a target domain utilizing a classifier learned from a source domain.