Search Results for author: Saad Biaz

Found 2 papers, 0 papers with code

Stable and Efficient Policy Evaluation

no code implementations • 6 Jun 2020 • Daoming Lyu, Bo Liu, Matthieu Geist, Wen Dong, Saad Biaz, Qi. Wang

Policy evaluation algorithms are essential to reinforcement learning due to their ability to predict the performance of a policy.

Reinforcement Learning (RL)

Paper
Add Code

O$^2$TD: (Near)-Optimal Off-Policy TD Learning

no code implementations • 17 Apr 2017 • Bo Liu, Daoming Lyu, Wen Dong, Saad Biaz

Temporal difference learning and Residual Gradient methods are the most widely used temporal difference based learning algorithms; however, it has been shown that none of their objective functions is optimal w. r. t approximating the true value function $V$.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.