no code implementations • 1 Jan 2021 • Rong Zhu, James Murray
Off-policy learning algorithms, in which an agent updates the value function of the optimal policy while selecting actions using an independent exploration policy, provide an effective solution to the explore-exploit tradeoff and have proven to be of great practical value in reinforcement learning.