Understanding the Impact of Data Distribution on Q-learning with Function Approximation

23 Nov 2021  ·  Pedro P. Santos, Francisco S. Melo, Alberto Sardinha, Diogo S. Carvalho ·

In this work, we study the interplay between the data distribution and Q-learning-based algorithms with function approximation. We provide a unified theoretical and empirical analysis as to why different properties of the data distribution can contribute to regulate sources of algorithmic instability. First, we study theoretical bounds on the performance of approximate dynamic programming algorithms. Second, we provide a novel four-state MDP that highlights the impact of the data distribution in the performance of a Q-learning algorithm with function approximation, both in online and offline settings. Finally, we experimentally assess the impact of the data distribution properties in the performance of offline Q-learning-based algorithms. Our results emphasize the key role played by data distribution properties while regulating algorithmic stability. We provide a systematic and comprehensive study of the topic by connecting different lines of research, as well as validating and extending previous results, both theoretically and empirically.

PDF Abstract
No code implementations yet. Submit your code now


  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.