no code implementations • NeurIPS 2010 • Jonathan Sorg, Richard L. Lewis, Satinder P. Singh
In this work, we develop a gradient ascent approach with formal convergence guarantees for approximately solving the optimal reward problem online during an agent's lifetime.