no code implementations • 2 Jun 2022 • Marcus Hutter, Steven Hansen
In the traditional "forward" view, transition "matrix" p(s'|sa) and policy {\pi}(a|s) uniquely determine "everything": the whole dynamics p(as'a's"a"...|s), and with it, the action-conditional state process p(s's"...|saa'a"), the multi-step inverse models p(aa'a"...|ss^i), etc.
no code implementations • NeurIPS 2021 • Steven Hansen, Guillaume Desjardins, Kate Baumli, David Warde-Farley, Nicolas Heess, Simon Osindero, Volodymyr Mnih
An agent might be said, informally, to have mastery of its environment when it has maximised the effective number of states it can reliably reach.
no code implementations • 28 Oct 2021 • Ishan Durugkar, Steven Hansen, Stephen Spencer, Volodymyr Mnih
This paper deals with the problem of learning a skill-conditioned policy that acts meaningfully in the absence of a reward signal.
no code implementations • ICLR 2022 • DJ Strouse, Kate Baumli, David Warde-Farley, Vlad Mnih, Steven Hansen
However, an inherent exploration problem lingers: when a novel state is actually encountered, the discriminator will necessarily not have seen enough training data to produce accurate and confident skill classifications, leading to low intrinsic reward for the agent and effective penalization of the sort of exploration needed to actually maximize the objective.
no code implementations • 24 Feb 2021 • Víctor Campos, Pablo Sprechmann, Steven Hansen, Andre Barreto, Steven Kapturowski, Alex Vitvitskyi, Adrià Puigdomènech Badia, Charles Blundell
We introduce Behavior Transfer (BT), a technique that leverages pre-trained policies for exploration and that is complementary to transferring neural network weights.
no code implementations • 14 Dec 2020 • Kate Baumli, David Warde-Farley, Steven Hansen, Volodymyr Mnih
In the absence of external rewards, agents can still learn useful behaviors by identifying and mastering a set of diverse skills within their environment.
1 code implementation • NeurIPS 2019 • Meire Fortunato, Melissa Tan, Ryan Faulkner, Steven Hansen, Adrià Puigdomènech Badia, Gavin Buttimore, Charlie Deck, Joel Z. Leibo, Charles Blundell
In this paper, we aim to develop a comprehensive methodology to test different kinds of memory in an agent and assess how well the agent can apply what it learns in training to a holdout set that differs from the training set along dimensions that we suggest are relevant for evaluating memory-specific generalization.
no code implementations • ICLR 2020 • Steven Hansen, Will Dabney, Andre Barreto, Tom Van de Wiele, David Warde-Farley, Volodymyr Mnih
It has been established that diverse behaviors spanning the controllable subspace of an Markov decision process can be trained by rewarding a policy for being distinguishable from other policies \citep{gregor2016variational, eysenbach2018diversity, warde2018unsupervised}.
no code implementations • ICLR 2019 • David Warde-Farley, Tom Van de Wiele, tejas kulkarni, Catalin Ionescu, Steven Hansen, Volodymyr Mnih
Learning to control an environment without hand-crafted rewards or expert data remains challenging and is at the frontier of reinforcement learning research.
1 code implementation • NeurIPS 2018 • Steven Hansen, Pablo Sprechmann, Alexander Pritzel, André Barreto, Charles Blundell
We propose Ephemeral Value Adjusments (EVA): a means of allowing deep reinforcement learning agents to rapidly adapt to experience in their replay buffer.