Search Results for author: Bowen Baker

Found 10 papers, 7 papers with code

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

no code implementations14 Dec 2023 Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, Jeff Wu

Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior - for example, to evaluate whether a model faithfully followed instructions or generated safe outputs.

Let's Verify Step by Step

3 code implementations Preprint 2023 Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe

We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset.

 Ranked #1 on Math Word Problem Solving on MATH minival (using extra training data)

Active Learning Math +2

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

2 code implementations23 Jun 2022 Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune

Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for training models with broad, general capabilities for text, images, and other modalities.

Imitation Learning reinforcement-learning +1

Emergent Reciprocity and Team Formation from Randomized Uncertain Social Preferences

1 code implementation NeurIPS 2020 Bowen Baker

Multi-agent reinforcement learning (MARL) has shown recent success in increasingly complex fixed-team zero-sum environments.

Multi-agent Reinforcement Learning

Emergent Tool Use From Multi-Agent Autocurricula

3 code implementations ICLR 2020 Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch

Through multi-agent competition, the simple objective of hide-and-seek, and standard reinforcement learning algorithms at scale, we find that agents create a self-supervised autocurriculum inducing multiple distinct rounds of emergent strategy, many of which require sophisticated tool use and coordination.

reinforcement-learning Reinforcement Learning (RL)

Learning Dexterous In-Hand Manipulation

no code implementations1 Aug 2018 OpenAI, Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, Jonas Schneider, Szymon Sidor, Josh Tobin, Peter Welinder, Lilian Weng, Wojciech Zaremba

We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand.

Friction reinforcement-learning +1

Accelerating Neural Architecture Search using Performance Prediction

2 code implementations ICLR 2018 Bowen Baker, Otkrist Gupta, Ramesh Raskar, Nikhil Naik

Methods for neural network hyperparameter optimization and meta-modeling are computationally expensive due to the need to train a large number of model configurations.

Hyperparameter Optimization Language Modelling +3

Designing Neural Network Architectures using Reinforcement Learning

5 code implementations7 Nov 2016 Bowen Baker, Otkrist Gupta, Nikhil Naik, Ramesh Raskar

We introduce MetaQNN, a meta-modeling algorithm based on reinforcement learning to automatically generate high-performing CNN architectures for a given learning task.

General Classification Image Classification +3

Cannot find the paper you are looking for? You can Submit a new open access paper.