This paper proposes a simple strategy for sim-to-real in Deep-Reinforcement Learning (DRL) -- called Roll-Drop -- that uses dropout during simulation to account for observation noise during deployment without explicitly modelling its distribution for each state.
Robotic locomotion is often approached with the goal of maximizing robustness and reactivity by increasing motion control frequency.
This allows us to obtain locomotion policies that are robust to variations in system dynamics.
We evaluate our approach on two versions of the real ANYmal quadruped robots and demonstrate that our method achieves a continuous blend of dynamic trot styles whilst being robust and reactive to external perturbations.
This encourages disentanglement such that application of a drive signal to a single dimension of the latent state induces holistic plans synthesising a continuous variety of trot styles.
Our results on a locomotion task using a single-leg hopper demonstrate that explicitly using the CPG as the Actor rather than as part of the environment results in a significant increase in the reward gained over time (6x more) compared with previous approaches.
We evaluate the robustness of our method over a wide variety of complex terrains.
In addition, kinodynamic constraints are often non-differentiable and difficult to implement in an optimisation approach.
Deep reinforcement learning (RL) uses model-free techniques to optimize task-specific control policies.