Search Results for author: Ganqu Cui

Found 34 papers, 28 papers with code

TTRL: Test-Time Reinforcement Learning

1 code implementation22 Apr 2025 Yuxin Zuo, Kaiyan Zhang, Shang Qu, Li Sheng, Xuekai Zhu, Biqing Qi, Youbang Sun, Ganqu Cui, Ning Ding, BoWen Zhou

Furthermore, although TTRL is only supervised by the Maj@N metric, TTRL has demonstrated performance to consistently surpass the upper limit of the initial model, and approach the performance of models trained directly on test data with ground-truth labels.

Math reinforcement-learning +2

Learning to Reason under Off-Policy Guidance

no code implementations21 Apr 2025 Jianhao Yan, Yafu Li, Zican Hu, Zhi Wang, Ganqu Cui, Xiaoye Qu, Yu Cheng, Yue Zhang

Recent advances in large reasoning models (LRMs) demonstrate that sophisticated behaviors such as multi-step reasoning and self-reflection can emerge via reinforcement learning (RL) with simple rule-based rewards.

Math Reinforcement Learning (RL)

AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset

no code implementations4 Apr 2025 Bingxiang He, Wenbin Zhang, Jiaxi Song, Cheng Qian, Zixuan Fu, Bowen Sun, Ning Ding, Haiwen Hong, Longtao Huang, Hui Xue, Ganqu Cui, Wanxiang Che, Zhiyuan Liu, Maosong Sun

Preference learning is critical for aligning large language models (LLMs) with human values, yet its success hinges on high-quality datasets comprising three core components: Preference \textbf{A}nnotations, \textbf{I}nstructions, and \textbf{R}esponse Pairs.

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

1 code implementation27 Mar 2025 Xiaoye Qu, Yafu Li, Zhaochen Su, Weigao Sun, Jianhao Yan, Dongrui Liu, Ganqu Cui, Daizong Liu, Shuxian Liang, Junxian He, Peng Li, Wei Wei, Jing Shao, Chaochao Lu, Yue Zhang, Xian-Sheng Hua, BoWen Zhou, Yu Cheng

Recent Large Reasoning Models (LRMs), such as DeepSeek-R1 and OpenAI o1, have demonstrated strong performance gains by scaling up the length of Chain-of-Thought (CoT) reasoning during inference.

Survey

Process Reinforcement through Implicit Rewards

2 code implementations3 Feb 2025 Ganqu Cui, Lifan Yuan, Zefan Wang, Hanbin Wang, Wendi Li, Bingxiang He, Yuchen Fan, Tianyu Yu, Qixin Xu, Weize Chen, Jiarui Yuan, Huayu Chen, Kaiyan Zhang, Xingtai Lv, Shuo Wang, Yuan YAO, Xu Han, Hao Peng, Yu Cheng, Zhiyuan Liu, Maosong Sun, BoWen Zhou, Ning Ding

While dense rewards also offer an appealing choice for the reinforcement learning (RL) of LLMs since their fine-grained rewards have the potential to address some inherent issues of outcome rewards, such as training efficiency and credit assignment, this potential remains largely unrealized.

Math Reinforcement Learning (RL)

From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning

1 code implementation21 Jan 2025 Yafu Li, Zhilin Wang, Tingchen Fu, Ganqu Cui, Sen yang, Yu Cheng

Scaling data and model size has been proven effective for boosting the performance of large language models.

Free Process Rewards without Process Labels

2 code implementations2 Dec 2024 Lifan Yuan, Wendi Li, Huayu Chen, Ganqu Cui, Ning Ding, Kaiyan Zhang, BoWen Zhou, Zhiyuan Liu, Hao Peng

The only assumption is to parameterize the outcome reward as the log-likelihood ratios of the policy and reference models, which can be optimized regardless of the specific choice of loss objectives.

Math

Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention

1 code implementation4 Nov 2024 Xingtai Lv, Ning Ding, Kaiyan Zhang, Ermo Hua, Ganqu Cui, BoWen Zhou

Improving the effectiveness and efficiency of large language models (LLMs) simultaneously is a critical yet challenging research goal.

Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity

no code implementations17 Jun 2024 Bingxiang He, Ning Ding, Cheng Qian, Jia Deng, Ganqu Cui, Lifan Yuan, Huan-ang Gao, Huimin Chen, Zhiyuan Liu, Maosong Sun

For the first time, we show that zero-shot generalization during instruction tuning is a form of similarity-based generalization between training and test data at the instance level.

Continual Learning Zero-shot Generalization

UltraMedical: Building Specialized Generalists in Biomedicine

1 code implementation6 Jun 2024 Kaiyan Zhang, Sihang Zeng, Ermo Hua, Ning Ding, Zhang-Ren Chen, Zhiyuan Ma, Haoxin Li, Ganqu Cui, Biqing Qi, Xuekai Zhu, Xingtai Lv, Hu Jinfang, Zhiyuan Liu, BoWen Zhou

Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains and are moving towards more specialized areas.

Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models

no code implementations13 Mar 2024 Ning Ding, Yulin Chen, Ganqu Cui, Xingtai Lv, Weilin Zhao, Ruobing Xie, BoWen Zhou, Zhiyuan Liu, Maosong Sun

Underlying data distributions of natural language, programming code, and mathematical symbols vary vastly, presenting a complex challenge for large language models (LLMs) that strive to achieve high performance across all three domains simultaneously.

Math

Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment

1 code implementation29 Feb 2024 Yiju Guo, Ganqu Cui, Lifan Yuan, Ning Ding, Zexu Sun, Bowen Sun, Huimin Chen, Ruobing Xie, Jie zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun

In practice, the multifaceted nature of human preferences inadvertently introduces what is known as the "alignment tax" -a compromise where enhancements in alignment within one objective (e. g., harmlessness) can diminish performance in others (e. g., helpfulness).

Navigate

Noise Contrastive Alignment of Language Models with Explicit Rewards

3 code implementations8 Feb 2024 Huayu Chen, Guande He, Lifan Yuan, Ganqu Cui, Hang Su, Jun Zhu

We evaluate our methods in both reward and preference settings with Mistral-8*7B and 7B models.

Language Modelling Math

INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair

1 code implementation16 Nov 2023 Hanbin Wang, Zhenghao Liu, Shuo Wang, Ganqu Cui, Ning Ding, Zhiyuan Liu, Ge Yu

INTERVENOR prompts Large Language Models (LLMs) to play distinct roles during the code repair process, functioning as both a Code Learner and a Code Teacher.

Code Repair Code Translation

UltraFeedback: Boosting Language Models with Scaled AI Feedback

4 code implementations2 Oct 2023 Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Bingxiang He, Wei Zhu, Yuan Ni, Guotong Xie, Ruobing Xie, Yankai Lin, Zhiyuan Liu, Maosong Sun

Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models, serving as a solid foundation for future feedback learning research.

Language Modelling

From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework

1 code implementation29 May 2023 Yangyi Chen, Hongcheng Gao, Ganqu Cui, Lifan Yuan, Dehan Kong, Hanlu Wu, Ning Shi, Bo Yuan, Longtao Huang, Hui Xue, Zhiyuan Liu, Maosong Sun, Heng Ji

In our experiments, we conduct a robustness evaluation of RoBERTa models to demonstrate the effectiveness of our evaluation framework, and further show the rationality of each component in the framework.

Adversarial Attack

Decoder Tuning: Efficient Language Understanding as Decoding

3 code implementations16 Dec 2022 Ganqu Cui, Wentao Li, Ning Ding, Longtao Huang, Zhiyuan Liu, Maosong Sun

With the evergrowing sizes of pre-trained models (PTMs), it has been an emerging practice to only provide the inference APIs for users, namely model-as-a-service (MaaS) setting.

Decoder Natural Language Understanding

Few-shot Classification with Hypersphere Modeling of Prototypes

no code implementations10 Nov 2022 Ning Ding, Yulin Chen, Ganqu Cui, Xiaobin Wang, Hai-Tao Zheng, Zhiyuan Liu, Pengjun Xie

Moreover, it is more convenient to perform metric-based classification with hypersphere prototypes than statistical modeling, as we only need to calculate the distance from a data point to the surface of the hypersphere.

Classification Few-Shot Learning +1

A Close Look into the Calibration of Pre-trained Language Models

2 code implementations31 Oct 2022 Yangyi Chen, Lifan Yuan, Ganqu Cui, Zhiyuan Liu, Heng Ji

We observe a consistent change in calibration performance across six factors.

Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP

2 code implementations19 Oct 2022 Yangyi Chen, Hongcheng Gao, Ganqu Cui, Fanchao Qi, Longtao Huang, Zhiyuan Liu, Maosong Sun

We discuss the deficiencies in previous work and propose our suggestions that the research on the Security-oriented adversarial NLP (SoadNLP) should: (1) evaluate their methods on security tasks to demonstrate the real-world concerns; (2) consider real-world attackers' goals, instead of developing impractical methods.

Data Augmentation

A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks

1 code implementation17 Jun 2022 Ganqu Cui, Lifan Yuan, Bingxiang He, Yangyi Chen, Zhiyuan Liu, Maosong Sun

However, we highlight two issues in previous backdoor learning evaluations: (1) The differences between real-world scenarios (e. g. releasing poisoned datasets or models) are neglected, and we argue that each scenario has its own constraints and concerns, thus requires specific evaluation protocols; (2) The evaluation metrics only consider whether the attacks could flip the models' predictions on poisoned samples and retain performances on benign samples, but ignore that poisoned samples should also be stealthy and semantic-preserving.

text similarity

Exploring the Universal Vulnerability of Prompt-based Learning Paradigm

1 code implementation Findings (NAACL) 2022 Lei Xu, Yangyi Chen, Ganqu Cui, Hongcheng Gao, Zhiyuan Liu

Prompt-based learning paradigm bridges the gap between pre-training and fine-tuning, and works effectively under the few-shot setting.

Prototypical Verbalizer for Prompt-based Few-shot Tuning

1 code implementation ACL 2022 Ganqu Cui, Shengding Hu, Ning Ding, Longtao Huang, Zhiyuan Liu

However, manual verbalizers heavily depend on domain-specific prior knowledge and human efforts, while finding appropriate label words automatically still remains challenging. In this work, we propose the prototypical verbalizer (ProtoVerb) which is built directly from training data.

Contrastive Learning Entity Typing +2

Evaluating Modules in Graph Contrastive Learning

1 code implementation15 Jun 2021 Ganqu Cui, Yufeng Du, Cheng Yang, Jie zhou, Liang Xu, Xing Zhou, Xingyi Cheng, Zhiyuan Liu

The recent emergence of contrastive learning approaches facilitates the application on graph representation learning (GRL), introducing graph contrastive learning (GCL) into the literature.

Contrastive Learning Graph Classification +1

Adaptive Graph Encoder for Attributed Graph Embedding

1 code implementation3 Jul 2020 Ganqu Cui, Jie zhou, Cheng Yang, Zhiyuan Liu

Experimental results show that AGE consistently outperforms state-of-the-art graph embedding methods considerably on these tasks.

Clustering Graph Embedding +2

Graph Neural Networks: A Review of Methods and Applications

5 code implementations20 Dec 2018 Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, LiFeng Wang, Changcheng Li, Maosong Sun

Lots of learning tasks require dealing with graph data which contains rich relation information among elements.

Graph Attention Graph Neural Network

Cannot find the paper you are looking for? You can Submit a new open access paper.