Search Results for author: Steven Hansen

Found 10 papers, 2 papers with code

Uniqueness and Complexity of Inverse MDP Models

no code implementations2 Jun 2022 Marcus Hutter, Steven Hansen

In the traditional "forward" view, transition "matrix" p(s'|sa) and policy {\pi}(a|s) uniquely determine "everything": the whole dynamics p(as'a's"a"...|s), and with it, the action-conditional state process p(s's"...|saa'a"), the multi-step inverse models p(aa'a"...|ss^i), etc.

Entropic Desired Dynamics for Intrinsic Control

no code implementations NeurIPS 2021 Steven Hansen, Guillaume Desjardins, Kate Baumli, David Warde-Farley, Nicolas Heess, Simon Osindero, Volodymyr Mnih

An agent might be said, informally, to have mastery of its environment when it has maximised the effective number of states it can reliably reach.

Montezuma's Revenge

Wasserstein Distance Maximizing Intrinsic Control

no code implementations28 Oct 2021 Ishan Durugkar, Steven Hansen, Stephen Spencer, Volodymyr Mnih

This paper deals with the problem of learning a skill-conditioned policy that acts meaningfully in the absence of a reward signal.

Learning more skills through optimistic exploration

no code implementations ICLR 2022 DJ Strouse, Kate Baumli, David Warde-Farley, Vlad Mnih, Steven Hansen

However, an inherent exploration problem lingers: when a novel state is actually encountered, the discriminator will necessarily not have seen enough training data to produce accurate and confident skill classifications, leading to low intrinsic reward for the agent and effective penalization of the sort of exploration needed to actually maximize the objective.

Beyond Fine-Tuning: Transferring Behavior in Reinforcement Learning

no code implementations24 Feb 2021 Víctor Campos, Pablo Sprechmann, Steven Hansen, Andre Barreto, Steven Kapturowski, Alex Vitvitskyi, Adrià Puigdomènech Badia, Charles Blundell

We introduce Behavior Transfer (BT), a technique that leverages pre-trained policies for exploration and that is complementary to transferring neural network weights.

reinforcement-learning Unsupervised Pre-training

Relative Variational Intrinsic Control

no code implementations14 Dec 2020 Kate Baumli, David Warde-Farley, Steven Hansen, Volodymyr Mnih

In the absence of external rewards, agents can still learn useful behaviors by identifying and mastering a set of diverse skills within their environment.

Hierarchical Reinforcement Learning reinforcement-learning

Generalization of Reinforcement Learners with Working and Episodic Memory

1 code implementation NeurIPS 2019 Meire Fortunato, Melissa Tan, Ryan Faulkner, Steven Hansen, Adrià Puigdomènech Badia, Gavin Buttimore, Charlie Deck, Joel Z. Leibo, Charles Blundell

In this paper, we aim to develop a comprehensive methodology to test different kinds of memory in an agent and assess how well the agent can apply what it learns in training to a holdout set that differs from the training set along dimensions that we suggest are relevant for evaluating memory-specific generalization.

Fast Task Inference with Variational Intrinsic Successor Features

no code implementations ICLR 2020 Steven Hansen, Will Dabney, Andre Barreto, Tom Van de Wiele, David Warde-Farley, Volodymyr Mnih

It has been established that diverse behaviors spanning the controllable subspace of an Markov decision process can be trained by rewarding a policy for being distinguishable from other policies \citep{gregor2016variational, eysenbach2018diversity, warde2018unsupervised}.

Unsupervised Control Through Non-Parametric Discriminative Rewards

no code implementations ICLR 2019 David Warde-Farley, Tom Van de Wiele, tejas kulkarni, Catalin Ionescu, Steven Hansen, Volodymyr Mnih

Learning to control an environment without hand-crafted rewards or expert data remains challenging and is at the frontier of reinforcement learning research.

reinforcement-learning

Fast deep reinforcement learning using online adjustments from the past

1 code implementation NeurIPS 2018 Steven Hansen, Pablo Sprechmann, Alexander Pritzel, André Barreto, Charles Blundell

We propose Ephemeral Value Adjusments (EVA): a means of allowing deep reinforcement learning agents to rapidly adapt to experience in their replay buffer.

Atari Games reinforcement-learning

Cannot find the paper you are looking for? You can Submit a new open access paper.