Search Results for author: Bowen Baker

Found 10 papers, 7 papers with code

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

no code implementations • 14 Dec 2023 • Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, Jeff Wu

Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior - for example, to evaluate whether a model faithfully followed instructions or generated safe outputs.

Paper
Add Code

Let's Verify Step by Step

3 code implementations • Preprint 2023 • Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe

We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset.

Ranked #1 on Math Word Problem Solving on MATH minival (using extra training data)

Active Learning Math +2

1,279

Paper
Code

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

2 code implementations • 23 Jun 2022 • Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune

Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for training models with broad, general capabilities for text, images, and other modalities.

Imitation Learning reinforcement-learning +1

1,205

Paper
Code

Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft

no code implementations • 28 Jun 2021 • Ingmar Kanitscheider, Joost Huizinga, David Farhi, William Hebgen Guss, Brandon Houghton, Raul Sampedro, Peter Zhokhov, Bowen Baker, Adrien Ecoffet, Jie Tang, Oleg Klimov, Jeff Clune

An important challenge in reinforcement learning is training agents that can solve a wide variety of tasks.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Emergent Reciprocity and Team Formation from Randomized Uncertain Social Preferences

1 code implementation • NeurIPS 2020 • Bowen Baker

Multi-agent reinforcement learning (MARL) has shown recent success in increasingly complex fixed-team zero-sum environments.

Multi-agent Reinforcement Learning

1,579

Paper
Code

Emergent Tool Use From Multi-Agent Autocurricula

3 code implementations • ICLR 2020 • Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch

Through multi-agent competition, the simple objective of hide-and-seek, and standard reinforcement learning algorithms at scale, we find that agents create a self-supervised autocurriculum inducing multiple distinct rounds of emergent strategy, many of which require sophisticated tool use and coordination.

reinforcement-learning Reinforcement Learning (RL)

1,579

Paper
Code

Learning Dexterous In-Hand Manipulation

no code implementations • 1 Aug 2018 • OpenAI, Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, Jonas Schneider, Szymon Sidor, Josh Tobin, Peter Welinder, Lilian Weng, Wojciech Zaremba

We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand.

Friction reinforcement-learning +1

Paper
Add Code

Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research

30 code implementations • 26 Feb 2018 • Matthias Plappert, Marcin Andrychowicz, Alex Ray, Bob McGrew, Bowen Baker, Glenn Powell, Jonas Schneider, Josh Tobin, Maciek Chociej, Peter Welinder, Vikash Kumar, Wojciech Zaremba

The purpose of this technical report is two-fold.

Continuous Control Multi-Goal Reinforcement Learning +3

141

Paper
Code

Accelerating Neural Architecture Search using Performance Prediction

2 code implementations • ICLR 2018 • Bowen Baker, Otkrist Gupta, Ramesh Raskar, Nikhil Naik

Methods for neural network hyperparameter optimization and meta-modeling are computationally expensive due to the need to train a large number of model configurations.

Hyperparameter Optimization Language Modelling +3

136

Paper
Code

Designing Neural Network Architectures using Reinforcement Learning

5 code implementations • 7 Nov 2016 • Bowen Baker, Otkrist Gupta, Nikhil Naik, Ramesh Raskar

We introduce MetaQNN, a meta-modeling algorithm based on reinforcement learning to automatically generate high-performing CNN architectures for a given learning task.

General Classification Image Classification +3

136

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.