Frequentist Regret Bounds for Randomized Least-Squares Value Iteration

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning (RL). When the state space is large or continuous, traditional tabular approaches are unfeasible and some form of function approximation is mandatory... (read more)

Results in Papers With Code
(↓ scroll down to see all results)