The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties.
In the present work, we introduce a method for training a neural network policy in simulation and transferring it to a state-of-the-art legged system, thereby leveraging fast, automated, and cost-effective data generation schemes.
However, the best RL algorithms for robotics require the robot and the environment to be reset to an initial state after each episode, that is, the robot is not learning autonomously.
One of the most interesting features of Bayesian optimization for direct policy search is that it can leverage priors (e. g., from simulation or from previous tasks) to accelerate learning on a robot.
In this paper we introduce a new benchmark for trajectory optimization and posture generation of legged robots, using a pre-defined scenario, robot and constraints, as well as evaluation criteria.
We develop a metalearning approach for learning hierarchically structured policies, improving sample efficiency on unseen tasks through the use of shared primitives---policies that are executed for large numbers of timesteps.