Sample complexity bounds are a common performance metric in the Reinforcement Learning literature. In the discounted cost, infinite horizon setting, all of the known bounds have a factor that is a polynomial in $1/(1\gamma)$, where $\gamma < 1$ is the discount factor... (read more)
QLearning

OffPolicy TD Control 