1 code implementation • 22 Dec 2024 • Yuxiang Zhang, YuQi Yang, Jiangming Shu, Yuhang Wang, Jinlin Xiao, Jitao Sang
OpenAI's recent introduction of Reinforcement Fine-Tuning (RFT) showcases the potential of reasoning foundation model and offers a new paradigm for fine-tuning beyond simple pattern imitation.
1 code implementation • 29 Nov 2024 • Yuxiang Zhang, Shangxi Wu, YuQi Yang, Jiangming Shu, Jinlin Xiao, Chao Kong, Jitao Sang
The technical report introduces O1-CODER, an attempt to replicate OpenAI's o1 model with a focus on coding tasks.
no code implementations • 12 Oct 2024 • Yunfan Yang, Chaoquan Jiang, Zhiyu Lin, Jinlin Xiao, Jiaming Zhang, Jitao Sang
Existing debiasing methods struggle to obtain sufficient image samples for minority groups and incur high costs for group labeling.
1 code implementation • 8 Jul 2024 • Yanxu Zhu, Jinlin Xiao, Yuhang Wang, Jitao Sang
Recent studies have demonstrated that large language models (LLMs) are susceptible to being misled by false premise questions (FPQs), leading to errors in factual knowledge, know as factuality hallucination.
1 code implementation • 1 Feb 2024 • Jitao Sang, Yuhang Wang, Jing Zhang, Yanxu Zhu, Chao Kong, Junhong Ye, Shuyu Wei, Jinlin Xiao
In the first phase, based on human supervision, the quality of weak supervision is enhanced through a combination of scalable oversight and ensemble learning, reducing the capability gap between weak teachers and strong students.