1 code implementation • 10 Apr 2020 • Mohammad Reza Bonyadi, Rui Wang, Maryam Ziaei
We prove that, under certain assumptions and regardless of the reinforcement learning algorithm used, these two strategies maintain the order of policies in the space of all possible policies in terms of their total reward, and, by extension, maintain the optimal policy.