Smooth Exploration for Robotic Reinforcement Learning

12 May 2020  ·  Antonin Raffin, Jens Kober, Freek Stulp ·

Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL -- often very successful in simulation -- leads to jerky motion patterns on real robots. Consequences of the resulting shaky behavior are poor exploration, or even damage to the robot. We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms. To enable this adaptation, we propose two extensions to the original SDE, using more general features and re-sampling the noise periodically, which leads to a new exploration method generalized state-dependent exploration (gSDE). We evaluate gSDE both in simulation, on PyBullet continuous control tasks, and directly on three different real robots: a tendon-driven elastic robot, a quadruped and an RC car. The noise sampling interval of gSDE permits to have a compromise between performance and smoothness, which allows training directly on the real robots without loss of performance. The code is available at https://github.com/DLR-RM/stable-baselines3.

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Continuous Control PyBullet Ant TD3 Return 2865 # 3
Continuous Control PyBullet Ant TD3 gSDE Return 3267 # 2
Continuous Control PyBullet Ant A2C gSDE Return 2560 # 6
Continuous Control PyBullet Ant PPO Return 2160 # 7
Continuous Control PyBullet Ant SAC Return 2859 # 4
Continuous Control PyBullet Ant A2C Return 1967 # 8
Continuous Control PyBullet Ant SAC gSDE Return 3459 # 1
Continuous Control PyBullet Ant PPO gSDE Return 2587 # 5
Continuous Control PyBullet HalfCheetah SAC Return 2883 # 1
Continuous Control PyBullet HalfCheetah PPO + gSDE Return 2760 # 3
Continuous Control PyBullet HalfCheetah A2C + gSDE Return 2028 # 7
Continuous Control PyBullet HalfCheetah PPO Return 2254 # 6
Continuous Control PyBullet HalfCheetah A2C Return 1652 # 8
Continuous Control PyBullet HalfCheetah TD3 Return 2687 # 4
Continuous Control PyBullet HalfCheetah TD3 gSDE Return 2578 # 5
Continuous Control PyBullet HalfCheetah SAC gSDE Return 2850 # 2
Continuous Control PyBullet Hopper A2C Return 1559 # 7
Continuous Control PyBullet Hopper A2C gSDE Return 1448 # 8
Continuous Control PyBullet Hopper SAC Return 2477 # 3
Continuous Control PyBullet Hopper TD3 gSDE Return 2353 # 5
Continuous Control PyBullet Hopper TD3 Return 2470 # 4
Continuous Control PyBullet Hopper SAC gSDE Return 2646 # 1
Continuous Control PyBullet Hopper PPO Return 1622 # 6
Continuous Control PyBullet Hopper PPO gSDE Return 2508 # 2
Continuous Control PyBullet Walker2D PPO gSDE Return 1776 # 5
Continuous Control PyBullet Walker2D A2C gSDE Return 694 # 7
Continuous Control PyBullet Walker2D TD3 gSDE Return 1989 # 4
Continuous Control PyBullet Walker2D PPO Return 1238 # 6
Continuous Control PyBullet Walker2D SAC gSDE Return 2341 # 1
Continuous Control PyBullet Walker2D SAC Return 2215 # 2
Continuous Control PyBullet Walker2D A2C Return 443 # 8
Continuous Control PyBullet Walker2D TD3 Return 2106 # 3

Methods