Proximal Policy Optimization Algorithms

20 Jul 2017John SchulmanFilip WolskiPrafulla DhariwalAlec RadfordOleg Klimov

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates... (read more)

PDF Abstract

Code


Evaluation results from the paper


  Submit results from this paper to get state-of-the-art GitHub badges and help community compare results to other papers.