no code implementations • NeurIPS 2008 • Peter Auer, Thomas Jaksch, Ronald Ortner
For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy.
reinforcement-learning Reinforcement Learning (RL)