no code implementations • 5 Jul 2019 • Brendan Bennett, Wesley Chung, Muhammad Zaheer, Vincent Liu
Temporal difference methods enable efficient estimation of value functions in reinforcement learning in an incremental fashion, and are of broader interest because they correspond learning as observed in biological systems.
no code implementations • 12 Jun 2018 • Yangchen Pan, Muhammad Zaheer, Adam White, Andrew Patterson, Martha White
We show that a model, as opposed to a replay buffer, is particularly useful for specifying which states to sample from during planning, such as predecessor states that propagate information in reverse from a state more quickly.