1 code implementation • 22 Dec 2024 • Yuxiang Zhang, YuQi Yang, Jiangming Shu, Yuhang Wang, Jinlin Xiao, Jitao Sang
OpenAI's recent introduction of Reinforcement Fine-Tuning (RFT) showcases the potential of reasoning foundation model and offers a new paradigm for fine-tuning beyond simple pattern imitation.
1 code implementation • 29 Nov 2024 • Yuxiang Zhang, Shangxi Wu, YuQi Yang, Jiangming Shu, Jinlin Xiao, Chao Kong, Jitao Sang
The technical report introduces O1-CODER, an attempt to replicate OpenAI's o1 model with a focus on coding tasks.