The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties.
In the present work, we introduce a method for training a neural network policy in simulation and transferring it to a state-of-the-art legged system, thereby leveraging fast, automated, and cost-effective data generation schemes.
However, the best RL algorithms for robotics require the robot and the environment to be reset to an initial state after each episode, that is, the robot is not learning autonomously.
In this paper we introduce a new benchmark for trajectory optimization and posture generation of legged robots, using a pre-defined scenario, robot and constraints, as well as evaluation criteria.
One of the most interesting features of Bayesian optimization for direct policy search is that it can leverage priors (e. g., from simulation or from previous tasks) to accelerate learning on a robot.
Locomotion is a prime example for adaptive behavior in animals and biological control principles have inspired control architectures for legged robots.