# Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning

24 Feb 2020Adithya M. DevrajSean P. Meyn

Sample complexity bounds are a common performance metric in the Reinforcement Learning literature. In the discounted cost, infinite horizon setting, all of the known bounds have a factor that is a polynomial in $1/(1-\gamma)$, where $\gamma < 1$ is the discount factor...

