Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP

ICLR 2020 Kefan DongYuanhao WangXiaoyu ChenLiwei Wang

A fundamental question in reinforcement learning is whether model-free algorithms are sample efficient. Recently, Jin et al. \cite{jin2018q} proposed a Q-learning algorithm with UCB exploration policy, and proved it has nearly optimal regret bound for finite-horizon episodic MDP... (read more)

PDF Abstract ICLR 2020 PDF ICLR 2020 Abstract

Code


No code implementations yet. Submit your code now

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper