Search Results for author: Boyang Hong

Found 3 papers, 3 papers with code

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

1 code implementation8 Feb 2024 Zhiheng Xi, Wenxiang Chen, Boyang Hong, Senjie Jin, Rui Zheng, wei he, Yiwen Ding, Shichun Liu, Xin Guo, Junzhe Wang, Honglin Guo, Wei Shen, Xiaoran Fan, Yuhao Zhou, Shihan Dou, Xiao Wang, Xinbo Zhang, Peng Sun, Tao Gui, Qi Zhang, Xuanjing Huang

In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models.

GSM8K reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.