no code implementations • NeurIPS 2023 • Alexandre Marthe, Aurélien Garivier, Claire Vernade
What are the functionals of the reward that can be computed and optimized exactly in Markov Decision Processes? In the finite-horizon, undiscounted setting, Dynamic Programming (DP) can only handle these operations efficiently for certain classes of statistics.