Jointly Learning Identification and Control for Few-Shot Policy Adaptation
Complex dynamical systems are challenging to model and control. Especially when not deployed in controlled conditions, they might be subject to disturbances that cannot be predicted in advance, \emph{e.g.} wind, a payload, or environment-specific forces. Adapting to such disturbances with a limited sample budget is difficult, especially for systems with many degrees of freedom. This paper introduces a theoretical framework to model this problem. We show that the expected error of a sensorimotor controller can be bounded by two components: the optimality of the controller and the domain gap between training and testing due to unmodelled dynamic effects. These components are usually minimized separately; the former with online or offline optimization, the latter with system identification. Motivated by this observation, we propose a differentiable programming approach to \emph{jointly} minimize model and control errors with gradient descent. Similar to model-based methods, our algorithm learns from prior knowledge about the system, but \emph{grounds} the model to account for observed disturbances, thereby favouring sample efficiency. Yet, it maintains the flexibility of model-free methods, which can be applied to generic systems with arbitrary inputs. We evaluate our approach on several complex systems and tasks, and experimentally analyze the advantages over model-free and model-based methods in terms of performance and sample efficiency.
PDF Abstract