Search Results for author: Nishanth Anand

Prediction and Control in Continual Reinforcement Learning

Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies.

Paper
Code

When the agent lands in a state, its value can be used to compute the TD-error, which is then propagated to other states.

Paper
Code

Despite recent successes in Reinforcement Learning, value-based methods often suffer from high variance hindering performance.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.