Benchmarking Deep Reinforcement Learning for Continuous Control

22 Apr 2016  ·  Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel ·

Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning. Some notable examples include training agents to play Atari games based on raw pixel data and to acquire advanced manipulation skills using raw sensory inputs. However, it has been difficult to quantify progress in the domain of continuous control due to the lack of a commonly adopted benchmark. In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure. We report novel findings based on the systematic evaluation of a range of implemented reinforcement learning algorithms. Both the benchmark and reference implementations are released at https://github.com/rllab/rllab in order to facilitate experimental reproducibility and to encourage adoption by other researchers.

PDF Abstract

Datasets


Introduced in the Paper:

RLLab Framework

Used in the Paper:

MuJoCo
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Continuous Control 2D Walker TRPO Score 1353.8 # 1
Continuous Control Acrobot TRPO Score -326 # 1
Continuous Control Acrobot (limited sensors) TRPO Score -83.3 # 1
Continuous Control Acrobot (noisy observations) TRPO Score -149.6 # 1
Continuous Control Acrobot (system identifications) TRPO Score -170.9 # 1
Continuous Control Ant TRPO Score 730.2 # 1
Continuous Control Ant + Gathering TRPO Score -0.4 # 1
Continuous Control Ant + Maze TRPO Score 0 # 1
Continuous Control Cart-Pole Balancing TRPO Score 4869.8 # 1
Continuous Control Cart-Pole Balancing (limited sensors) TRPO Score 960.2 # 1
Continuous Control Cart-Pole Balancing (noisy observations) TRPO Score 606.2 # 1
Continuous Control Cart-Pole Balancing (system identifications) TRPO Score 980.3 # 1
Continuous Control Double Inverted Pendulum TRPO Score 4412.4 # 1
Continuous Control Full Humanoid TRPO Score 287 # 1
Continuous Control Half-Cheetah TRPO Score 1914 # 1
Continuous Control Hopper TRPO Score 1183.3 # 1
Continuous Control Inverted Pendulum TRPO Score 247.2 # 1
Continuous Control Inverted Pendulum (limited sensors) TRPO Score 4.5 # 1
Continuous Control Inverted Pendulum (noisy observations) TRPO Score 10.4 # 1
Continuous Control Inverted Pendulum (system identifications) TRPO Score 14.1 # 1
Continuous Control Mountain Car TRPO Score -61.7 # 1
Continuous Control Mountain Car (limited sensors) TRPO Score -64.2 # 1
Continuous Control Mountain Car (noisy observations) TRPO Score -60.2 # 1
Continuous Control Mountain Car (system identifications) TRPO Score -61.6 # 1
Continuous Control Simple Humanoid TRPO Score 269.7 # 1
Continuous Control Swimmer TRPO Score 96 # 1
Continuous Control Swimmer + Gathering TRPO Score 0 # 1
Continuous Control Swimmer + Maze TRPO Score 0 # 1

Methods


No methods listed for this paper. Add relevant methods here