Long-Term Planning and Situational Awareness in OpenAI Five

no code implementations13 Dec 2019 Jonathan Raiman, Susan Zhang, Filip Wolski

Understanding how knowledge about the world is represented within model-free deep reinforcement learning methods is a major challenge given the black box nature of its learning process within high-dimensional observation and action spaces.

Dota 2

Proximal Policy Optimization Algorithms

175 code implementations20 Jul 2017 John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.

Continuous Control Dota 2 +5

