1 code implementation • 5 Jan 2025 • Simon Park, Abhishek Panigrahi, Yun Cheng, Dingli Yu, Anirudh Goyal, Sanjeev Arora
We seek strategies for training on the SIMPLE version of the tasks that improve performance on the corresponding HARD task, i. e., S2H generalization.
no code implementations • 27 Aug 2024 • Simran Kaur, Simon Park, Anirudh Goyal, Sanjeev Arora
Vanilla SFT (i. e., no PPO, DPO, or RL methods) on data generated from Instruct-SkillMix leads to strong gains on instruction following benchmarks such as AlpacaEval 2. 0, MT-Bench, and WildBench.
1 code implementation • 30 Jul 2024 • Vedant Shah, Dingli Yu, Kaifeng Lyu, Simon Park, Jiatong Yu, Yinghui He, Nan Rosemary Ke, Michael Mozer, Yoshua Bengio, Sanjeev Arora, Anirudh Goyal
We present a design framework that combines the strengths of LLMs with a human-in-the-loop approach to generate a diverse array of challenging math questions.