Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning

24 Feb 2020Adithya M. DevrajSean P. Meyn

Sample complexity bounds are a common performance metric in the Reinforcement Learning literature. In the discounted cost, infinite horizon setting, all of the known bounds have a factor that is a polynomial in $1/(1-\gamma)$, where $\gamma < 1$ is the discount factor... (read more)

PDF Abstract

Code


No code implementations yet. Submit your code now

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper