no code implementations • 25 Sep 2019 • Quan Vuong, Shuang Liu, Minghua Liu, Kamil Ciosek, Hao Su, Henrik Iskov Christensen
Combining ideas from Batch RL and Meta RL, we propose tiMe, which learns distillation of multiple value functions and MDP embeddings from only existing data.
no code implementations • NeurIPS 2020 • Jiachen Li, Quan Vuong, Shuang Liu, Minghua Liu, Kamil Ciosek, Keith Ross, Henrik Iskov Christensen, Hao Su
To perform well, the policy must infer the task identity from collected transitions by modelling its dependency on states, actions and rewards.