no code implementations • 18 Dec 2018 • Dragos Florin Ciocan, Velibor V. Mišić
We formulate the problem of learning such policies from observed trajectories of the stochastic system as a sample average approximation (SAA) problem.
Marketing