no code implementations • 27 Feb 2023 • Antoine Moulin, Gergely Neu
We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure.