Search Results for author: Shusheng Xu

Found 10 papers, 5 papers with code

How Far Are We from Optimal Reasoning Efficiency?

1 code implementation8 Jun 2025 Jiaxuan Gao, Shu Yan, Qixin Tan, Lu Yang, Shusheng Xu, Wei Fu, Zhiyu Mei, Kaifeng Lyu, Yi Wu

To reduce the efficiency gap, we propose REO-RL, a class of Reinforcement Learning algorithms that minimizes REG by targeting a sparse set of token budgets.

16k Benchmarking +1

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

1 code implementation30 May 2025 Wei Fu, Jiaxuan Gao, Xujie Shen, Chen Zhu, Zhiyu Mei, Chuyi He, Shusheng Xu, Guo Wei, Jun Mei, Jiashu Wang, Tongkai Yang, Binhang Yuan, Yi Wu

Most existing large-scale RL systems for LLMs are synchronous, alternating generation and training in a batch setting where rollouts in each training batch are generated by the same model.

Math Reinforcement Learning (RL)

On Designing Effective RL Reward at Training Time for LLM Reasoning

no code implementations19 Oct 2024 Jiaxuan Gao, Shusheng Xu, Wenjie Ye, Weilin Liu, Chuyi He, Wei Fu, Zhiyu Mei, Guangju Wang, Yi Wu

In this work, we evaluate popular reward models for RL training, including the Outcome-supervised Reward Model (ORM) and the Process-supervised Reward Model (PRM), and train a collection of LLMs for math problems using RL by combining these learned rewards with success rewards.

GSM8K Math

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

1 code implementation16 Apr 2024 Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu

However, in academic benchmarks, state-of-the-art results are often achieved via reward-free methods, such as Direct Preference Optimization (DPO).

Code Generation

A Benchmark for Low-Switching-Cost Reinforcement Learning

no code implementations13 Dec 2021 Shusheng Xu, Yancheng Liang, Yunfei Li, Simon Shaolei Du, Yi Wu

A ubiquitous requirement in many practical reinforcement learning (RL) applications, including medical treatment, recommendation system, education and robotics, is that the deployed policy that actually interacts with the environment cannot change frequently.

Atari Games reinforcement-learning +2

Native Chinese Reader: A Dataset Towards Native-Level Chinese Machine Reading Comprehension

no code implementations13 Dec 2021 Shusheng Xu, Yichen Liu, Xiaoyu Yi, Siyuan Zhou, Huizi Li, Yi Wu

We present Native Chinese Reader (NCR), a new machine reading comprehension (MRC) dataset with particularly long articles in both modern and classical Chinese.

Articles Common Sense Reasoning +1

PhyloTransformer: A Discriminative Model for Mutation Prediction Based on a Multi-head Self-attention Mechanism

no code implementations3 Nov 2021 Yingying Wu, Shusheng Xu, Shing-Tung Yau, Yi Wu

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused an ongoing pandemic infecting 219 million people as of 10/19/21, with a 3. 6% mortality rate.

Language Modelling

Sequence Level Contrastive Learning for Text Summarization

1 code implementation8 Sep 2021 Shusheng Xu, Xingxing Zhang, Yi Wu, Furu Wei

In this paper, we propose a contrastive learning model for supervised abstractive text summarization, where we view a document, its gold summary and its model generated summaries as different views of the same mean representation and maximize the similarities between them during training.

Abstractive Text Summarization Contrastive Learning +2

Deep Q-Learning with Low Switching Cost

no code implementations1 Jan 2021 Shusheng Xu, Simon Shaolei Du, Yi Wu

We initiate the study on deep reinforcement learning problems that require low switching cost, i. e., small number of policy switches during training.

Atari Games Deep Reinforcement Learning +3

Cannot find the paper you are looking for? You can Submit a new open access paper.