Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting

NeurIPS 2017 Yue WangWei ChenYuting LiuZhi-Ming MaTie-Yan Liu

In reinforcement learning (RL) , one of the key components is policy evaluation, which aims to estimate the value function (i.e., expected long-term accumulated reward) of a policy. With a good policy evaluation method, the RL algorithms will estimate the value function more accurately and find a better policy... (read more)

PDF Abstract


No code implementations yet. Submit your code now


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper

🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet