Continuous control with deep reinforcement learning

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
OpenAI Gym Ant-v4 DDPG Average Return 1712.12 # 4
OpenAI Gym HalfCheetah-v4 DDPG Average Return 14934.86 # 2
OpenAI Gym Hopper-v4 DDPG Average Return 1290.24 # 4
OpenAI Gym Humanoid-v4 DDPG Average Return 139.14 # 5
Continuous Control Lunar Lander (OpenAI Gym) DDPG Score 256.98±14.38 # 3
OpenAI Gym Walker2d-v4 DDPG Average Return 2994.54 # 3

Methods