no code implementations • ICML 2020 • Heewoo Jun, Rewon Child, Mark Chen, John Schulman, Aditya Ramesh, Alec Radford, Ilya Sutskever
We present conditional augmentation (CondAugment), a simple and powerful method of regularizing generative models.
no code implementations • 31 Jan 2023 • Jacob Hilton, Jie Tang, John Schulman
Recent work has shown that, in generative modeling, cross-entropy loss improves smoothly with model size and training compute, following a power law plus constant scaling law.
no code implementations • 19 Oct 2022 • Leo Gao, John Schulman, Jacob Hilton
In reinforcement learning from human feedback, it is common to optimize against a reward model trained to predict human preferences.
1 code implementation • 28 Jul 2022 • Mohammad Bavarian, Heewoo Jun, Nikolas Tezak, John Schulman, Christine McLeavey, Jerry Tworek, Mark Chen
To this end, we run a series of ablations on key hyperparameters, such as the data transformation frequency, the structure of the transformation, and the method of selecting the infill span.
3 code implementations • 4 Mar 2022 • Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe
In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback.
no code implementations • 17 Dec 2021 • Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman
This model's answers are preferred by humans 56% of the time to those of our human demonstrators, and 69% of the time to the highest-voted answer from Reddit.
2 code implementations • 27 Oct 2021 • Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman
State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning.
1 code implementation • 1 Oct 2021 • Jacob Hilton, Karl Cobbe, John Schulman
We say an algorithm is batch size-invariant if changes to the batch size can largely be compensated for by changes to other hyperparameters.
no code implementations • 28 Sep 2021 • Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt
Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings.
no code implementations • 29 Mar 2021 • Sharada Mohanty, Jyotish Poonganam, Adrien Gaidon, Andrey Kolobov, Blake Wulfe, Dipam Chakraborty, Gražvydas Šemetulskis, João Schapke, Jonas Kubilius, Jurgis Pašukonis, Linas Klimas, Matthew Hausknecht, Patrick MacAlpine, Quang Nhat Tran, Thomas Tumiel, Xiaocheng Tang, Xinwei Chen, Christopher Hesse, Jacob Hilton, William Hebgen Guss, Sahika Genc, John Schulman, Karl Cobbe
We present the design of a centralized benchmark for Reinforcement Learning which can help measure Sample Efficiency and Generalization in Reinforcement Learning by doing end to end evaluation of the training and rollout phases of thousands of user submitted code bases in a scalable way.
no code implementations • 26 Jan 2021 • William H. Guss, Mario Ynocente Castro, Sam Devlin, Brandon Houghton, Noboru Sean Kuno, Crissman Loomis, Stephanie Milani, Sharada Mohanty, Keisuke Nakata, Ruslan Salakhutdinov, John Schulman, Shinya Shiroshita, Nicholay Topin, Avinash Ummadisingu, Oriol Vinyals
Although deep reinforcement learning has led to breakthroughs in many difficult domains, these successes have required an ever-increasing number of samples, affording only a shrinking segment of the AI community access to their development.
no code implementations • 28 Oct 2020 • Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, Chris Hallacy, Benjamin Mann, Alec Radford, Aditya Ramesh, Nick Ryder, Daniel M. Ziegler, John Schulman, Dario Amodei, Sam McCandlish
The optimal model size also depends on the compute budget through a power-law, with exponents that are nearly universal across all data domains.
3 code implementations • 9 Sep 2020 • Karl Cobbe, Jacob Hilton, Oleg Klimov, John Schulman
We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases.
Ranked #1 on
Reinforcement Learning (RL)
on ProcGen
5 code implementations • ICML 2020 • Karl Cobbe, Christopher Hesse, Jacob Hilton, John Schulman
We introduce Procgen Benchmark, a suite of 16 procedurally generated game-like environments designed to benchmark both sample efficiency and generalization in reinforcement learning.
no code implementations • 7 Apr 2019 • Thomas Anthony, Robert Nishihara, Philipp Moritz, Tim Salimans, John Schulman
Monte Carlo Tree Search (MCTS) algorithms perform simulation-based search to improve policies online.
no code implementations • 6 Feb 2019 • Jacob Jackson, John Schulman
We then formulate an optimization problem whose objective is to minimize the distance between the labeled and the unlabeled data in this space, and we solve it by gradient descent on the imputed labels.
1 code implementation • 6 Dec 2018 • Karl Cobbe, Oleg Klimov, Chris Hesse, Tae-hoon Kim, John Schulman
In this paper, we investigate the problem of overfitting in deep reinforcement learning.
1 code implementation • 14 Sep 2018 • Ignasi Clavera, Jonas Rothfuss, John Schulman, Yasuhiro Fujita, Tamim Asfour, Pieter Abbeel
Finally, we demonstrate that our approach is able to match the asymptotic performance of model-free methods while requiring significantly less experience.
Model-based Reinforcement Learning
reinforcement-learning
+1
3 code implementations • 10 Apr 2018 • Alex Nichol, Vicki Pfau, Christopher Hesse, Oleg Klimov, John Schulman
In this report, we present a new reinforcement learning (RL) benchmark based on the Sonic the Hedgehog (TM) video game franchise.
12 code implementations • 8 Mar 2018 • Alex Nichol, Joshua Achiam, John Schulman
This paper considers meta-learning problems, where there is a distribution of tasks, and we would like to obtain an agent that performs well (i. e., learns quickly) when presented with a previously unseen task sampled from this distribution.
3 code implementations • ICLR 2018 • Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, John Schulman
We develop a metalearning approach for learning hierarchically structured policies, improving sample efficiency on unseen tasks through the use of shared primitives---policies that are executed for large numbers of timesteps.
no code implementations • 28 Sep 2017 • Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, Giulia Vezzani, John Schulman, Emanuel Todorov, Sergey Levine
Furthermore, deployment of DRL on physical systems remains challenging due to sample inefficiency.
158 code implementations • 20 Jul 2017 • John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.
Ranked #4 on
Continuous Control
on Lunar Lander (OpenAI Gym)
3 code implementations • 1 Jul 2017 • Tambet Matiisen, Avital Oliver, Taco Cohen, John Schulman
We propose Teacher-Student Curriculum Learning (TSCL), a framework for automatic curriculum learning, where the Student tries to learn a complex task and the Teacher automatically chooses subtasks from a given set for the Student to train on.
no code implementations • ICLR 2018 • Richard Y. Chen, Szymon Sidor, Pieter Abbeel, John Schulman
We show how an ensemble of $Q^*$-functions can be leveraged for more effective exploration in deep reinforcement learning.
no code implementations • 21 Apr 2017 • John Schulman, Xi Chen, Pieter Abbeel
A partial explanation may be that $Q$-learning methods are secretly implementing policy gradient updates: we show that there is a precise equivalence between $Q$-learning and policy gradient methods in the setting of entropy-regularized reinforcement learning, that "soft" (entropy-regularized) $Q$-learning is exactly equivalent to a policy gradient method.
3 code implementations • NeurIPS 2017 • Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel
In this work, we describe a surprising finding: a simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks.
Ranked #1 on
Atari Games
on Atari 2600 Freeway
16 code implementations • 9 Nov 2016 • Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel
The activations of the RNN store the state of the "fast" RL algorithm on the current (previously unseen) MDP.
no code implementations • 8 Nov 2016 • Xi Chen, Diederik P. Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, Pieter Abbeel
Representation learning seeks to expose certain aspects of observed data in a learned representation that's amenable to downstream tasks like classification.
1 code implementation • 21 Jun 2016 • Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané
Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society.
37 code implementations • NeurIPS 2016 • Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel
This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner.
Ranked #3 on
Image Generation
on Stanford Dogs
44 code implementations • 5 Jun 2016 • Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, Wojciech Zaremba
OpenAI Gym is a toolkit for reinforcement learning research.
2 code implementations • NeurIPS 2016 • Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel
While there are methods with optimality guarantees in the setting of discrete state and action spaces, these methods cannot be applied in high-dimensional deep RL scenarios.
1 code implementation • 9 May 2016 • The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, Ying Zhang
Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements.
15 code implementations • 22 Apr 2016 • Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel
Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning.
Ranked #1 on
Continuous Control
on Inverted Pendulum
1 code implementation • NeurIPS 2015 • John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel
In a variety of problems originating in supervised, unsupervised, and reinforcement learning, the loss function is defined by an expectation over a collection of random variables, which might be part of a probabilistic model or the external world.
17 code implementations • 8 Jun 2015 • John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel
Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks.
21 code implementations • 19 Feb 2015 • John Schulman, Sergey Levine, Philipp Moritz, Michael. I. Jordan, Pieter Abbeel
We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement.