no code implementations • NeurIPS 2020 • Diogo Carvalho, Francisco S. Melo, Pedro Santos
In this work, we identify a novel set of conditions that ensure convergence with probability 1 of Q-learning with linear function approximation, by proposing a two time-scale variation thereof.