Built upon such a policy optimization persepctive, our paper extends these subgradient-based search methods to a model-free setting.
We show that the ROA analysis can be approximated as a constrained maximization problem whose goal is to find the worst-case initial condition which shifts the terminal state the most.
We build a connection between robust adversarial RL and $\mu$ synthesis, and develop a model-free version of the well-known $DK$-iteration for solving state-feedback $\mu$ synthesis with static $D$-scaling.
In this paper, we investigate the global convergence of gradient-based policy optimization methods for quadratic optimal control of discrete-time Markovian jump linear systems (MJLS).
We implement the (data-driven) natural policy gradient method on different MJLS examples.
Recently, policy optimization for control purposes has received renewed attention due to the increasing interest in reinforcement learning.
As a consequence, we obtain theoretical regret bounds on sample efficiency of our solution that depends on key problem parameters like smoothness, near-optimality dimension, and batch size.