We propose a novel framework for multitask reinforcement learning based on the minimum description length (MDL) principle.
Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms with strong performance across multiple domains.
Both animals and artificial agents benefit from state representations that support rapid transfer of learning across tasks and which enable them to efficiently traverse their environments to reach rewarding states.
In recent years, deep off-policy actor-critic algorithms have become a dominant approach to reinforcement learning for continuous control.
A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning (RL).
Standard gradient descent methods are susceptible to a range of issues that can impede training, such as high correlations and different scaling in parameter space. These difficulties can be addressed by second-order approaches that apply a pre-conditioning matrix to the gradient to improve convergence.