Search Results for author: James Murray

Found 1 papers, 0 papers with code

Gradient descent temporal difference-difference learning

no code implementations1 Jan 2021 Rong Zhu, James Murray

Off-policy learning algorithms, in which an agent updates the value function of the optimal policy while selecting actions using an independent exploration policy, provide an effective solution to the explore-exploit tradeoff and have proven to be of great practical value in reinforcement learning.

Cannot find the paper you are looking for? You can Submit a new open access paper.