Search Results for author: Saad Biaz

Found 2 papers, 0 papers with code

Stable and Efficient Policy Evaluation

no code implementations6 Jun 2020 Daoming Lyu, Bo Liu, Matthieu Geist, Wen Dong, Saad Biaz, Qi. Wang

Policy evaluation algorithms are essential to reinforcement learning due to their ability to predict the performance of a policy.

Reinforcement Learning (RL)

O$^2$TD: (Near)-Optimal Off-Policy TD Learning

no code implementations17 Apr 2017 Bo Liu, Daoming Lyu, Wen Dong, Saad Biaz

Temporal difference learning and Residual Gradient methods are the most widely used temporal difference based learning algorithms; however, it has been shown that none of their objective functions is optimal w. r. t approximating the true value function $V$.

Cannot find the paper you are looking for? You can Submit a new open access paper.