This paper addresses the problem of learning the optimal feedback policy for a nonlinear stochastic dynamical system. Feedback policies typically need a high dimensional parametrization, which makes Reinforcement Learning (RL) algorithms that search for an optimum in this large parameter space, sample inefficient and subject to high variance... (read more)
PDFMETHOD | TYPE | |
---|---|---|
🤖 No Methods Found | Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet |