3 code implementations • Preprint 2023 • Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe
We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset.
Ranked #1 on
Math Word Problem Solving
on MATH minival
(using extra training data)
no code implementations • ICML 2018 • Aditya Grover, Maruan Al-Shedivat, Jayesh K. Gupta, Yura Burda, Harrison Edwards
Modeling agent behavior is central to understanding the emergence of complex phenomena in multiagent systems.
no code implementations • ICLR 2018 • Dustin Tran, Yura Burda, Ilya Sutskever
We examine how learning from unaligned data can improve both the data efficiency of supervised tasks as well as enable alignments without any supervision.