Search Results for author: Karl Cobbe

Found 8 papers, 7 papers with code

Let's Verify Step by Step

3 code implementations • Preprint 2023 • Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe

We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset.

Ranked #1 on Math Word Problem Solving on MATH minival (using extra training data)

Active Learning Math +2

1,282

Paper
Code

WebGPT: Browser-assisted question-answering with human feedback

2 code implementations • 17 Dec 2021 • Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman

This model's answers are preferred by humans 56% of the time to those of our human demonstrators, and 69% of the time to the highest-voted answer from Reddit.

Imitation Learning Navigate +1

786

Paper
Code

Training Verifiers to Solve Math Word Problems

3 code implementations • 27 Oct 2021 • Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman

State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning.

GSM8K Math +1

874

Paper
Code

Batch size-invariance for policy optimization

1 code implementation • 1 Oct 2021 • Jacob Hilton, Karl Cobbe, John Schulman

We say an algorithm is batch size-invariant if changes to the batch size can largely be compensated for by changes to other hyperparameters.

Paper
Code

Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark

no code implementations • 29 Mar 2021 • Sharada Mohanty, Jyotish Poonganam, Adrien Gaidon, Andrey Kolobov, Blake Wulfe, Dipam Chakraborty, Gražvydas Šemetulskis, João Schapke, Jonas Kubilius, Jurgis Pašukonis, Linas Klimas, Matthew Hausknecht, Patrick MacAlpine, Quang Nhat Tran, Thomas Tumiel, Xiaocheng Tang, Xinwei Chen, Christopher Hesse, Jacob Hilton, William Hebgen Guss, Sahika Genc, John Schulman, Karl Cobbe

We present the design of a centralized benchmark for Reinforcement Learning which can help measure Sample Efficiency and Generalization in Reinforcement Learning by doing end to end evaluation of the training and rollout phases of thousands of user submitted code bases in a scalable way.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Phasic Policy Gradient

3 code implementations • 9 Sep 2020 • Karl Cobbe, Jacob Hilton, Oleg Klimov, John Schulman

We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases.

Ranked #1 on Reinforcement Learning (RL) on ProcGen (using extra training data)

Reinforcement Learning (RL)

2,534

Paper
Code

Leveraging Procedural Generation to Benchmark Reinforcement Learning

6 code implementations • ICML 2020 • Karl Cobbe, Christopher Hesse, Jacob Hilton, John Schulman

We introduce Procgen Benchmark, a suite of 16 procedurally generated game-like environments designed to benchmark both sample efficiency and generalization in reinforcement learning.

Procgen Hard (100M) reinforcement-learning +1

972

Paper
Code

Quantifying Generalization in Reinforcement Learning

1 code implementation • 6 Dec 2018 • Karl Cobbe, Oleg Klimov, Chris Hesse, Tae-hoon Kim, John Schulman

In this paper, we investigate the problem of overfitting in deep reinforcement learning.

Data Augmentation L2 Regularization +2

381

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.