Search Results for author: Chuyi He

Found 2 papers, 1 papers with code

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

1 code implementation30 May 2025 Wei Fu, Jiaxuan Gao, Xujie Shen, Chen Zhu, Zhiyu Mei, Chuyi He, Shusheng Xu, Guo Wei, Jun Mei, Jiashu Wang, Tongkai Yang, Binhang Yuan, Yi Wu

Most existing large-scale RL systems for LLMs are synchronous, alternating generation and training in a batch setting where rollouts in each training batch are generated by the same model.

Math Reinforcement Learning (RL)

On Designing Effective RL Reward at Training Time for LLM Reasoning

no code implementations19 Oct 2024 Jiaxuan Gao, Shusheng Xu, Wenjie Ye, Weilin Liu, Chuyi He, Wei Fu, Zhiyu Mei, Guangju Wang, Yi Wu

In this work, we evaluate popular reward models for RL training, including the Outcome-supervised Reward Model (ORM) and the Process-supervised Reward Model (PRM), and train a collection of LLMs for math problems using RL by combining these learned rewards with success rewards.

GSM8K Math

Cannot find the paper you are looking for? You can Submit a new open access paper.