1 code implementation • NeurIPS 2023 • Nishanth Anand, Doina Precup
Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies.
1 code implementation • 11 Jun 2021 • Nishanth Anand, Doina Precup
When the agent lands in a state, its value can be used to compute the TD-error, which is then propagated to other states.
no code implementations • 23 May 2019 • Pierre Thodoroff, Nishanth Anand, Lucas Caccia, Doina Precup, Joelle Pineau
Despite recent successes in Reinforcement Learning, value-based methods often suffer from high variance hindering performance.