In this paper, we show that given the computational graph of the function, this bound can be reduced to $O(m\tau^3)$, where $\tau, m$ are the width and size of a tree-decomposition of the graph.
We study how local trajectory optimization can cope with approximation errors in the value function, and can stabilize and accelerate value function learning.
Reinforcement learning has emerged as a promising methodology for training robot controllers.
Furthermore, deployment of DRL on physical systems remains challenging due to sample inefficiency.
This work shows that policies with simple linear and RBF parameterizations can be trained to solve a variety of continuous control tasks, including the OpenAI gym benchmarks.
We demonstrate that such controllers can perform the task robustly, both in simulation and on the physical platform, for a limited range of initial conditions around the trained starting state.
To facilitate optimal control applications and in particular sampling and finite differencing, the dynamics can be evaluated for different states and controls in parallel.