no code implementations • 25 Sep 2022 • Matthew J. Sargent, Peter J. Bentley, Caswell Barry, William de Cothi
We show that in environments with dynamic reward structure, t-SR is able to leverage both the flexibility of the successor representation and the abstraction afforded by temporally extended actions.
no code implementations • 14 Sep 2022 • Augustine N. Mavor-Parker, Matthew J. Sargent, Christian Pehle, Andrea Banino, Lewis D. Griffin, Caswell Barry
Reinforcement learning agents must painstakingly learn through trial and error what sets of state-action pairs are value equivalent -- requiring an often prohibitively large amount of environment experience.