Search Results for author: James Murray

Found 1 papers, 0 papers with code

Gradient descent temporal difference-difference learning

no code implementations • 1 Jan 2021 • Rong Zhu, James Murray

Off-policy learning algorithms, in which an agent updates the value function of the optimal policy while selecting actions using an independent exploration policy, provide an effective solution to the explore-exploit tradeoff and have proven to be of great practical value in reinforcement learning.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.