no code implementations • 6 Jun 2020 • Daoming Lyu, Bo Liu, Matthieu Geist, Wen Dong, Saad Biaz, Qi. Wang
Policy evaluation algorithms are essential to reinforcement learning due to their ability to predict the performance of a policy.
no code implementations • 17 Apr 2017 • Bo Liu, Daoming Lyu, Wen Dong, Saad Biaz
Temporal difference learning and Residual Gradient methods are the most widely used temporal difference based learning algorithms; however, it has been shown that none of their objective functions is optimal w. r. t approximating the true value function $V$.