Search Results for author: Anna Harutyunyan

Found 16 papers, 2 papers with code

Bootstrapped Representations in Reinforcement Learning

no code implementations • 16 Jun 2023 • Charline Le Lan, Stephen Tu, Mark Rowland, Anna Harutyunyan, Rishabh Agarwal, Marc G. Bellemare, Will Dabney

In this paper, we address this gap and provide a theoretical characterization of the state representation learnt by temporal difference learning (Sutton, 1988).

Auxiliary Learning reinforcement-learning +1

Paper
Add Code

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

no code implementations • 29 May 2023 • Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko

Multi-step learning applies lookahead over multiple time steps and has proved valuable in policy evaluation settings.

Paper
Add Code

An Analysis of Quantile Temporal-Difference Learning

no code implementations • 11 Jan 2023 • Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney

We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning.

Distributional Reinforcement Learning reinforcement-learning +1

Paper
Add Code

On the Expressivity of Markov Reward

no code implementations • NeurIPS 2021 • David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh

We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists.

Paper
Add Code

Model-Free Counterfactual Credit Assignment

no code implementations • 1 Jan 2021 • Thomas Mesnard, Theophane Weber, Fabio Viola, Shantanu Thakoor, Alaa Saade, Anna Harutyunyan, Will Dabney, Tom Stepleton, Nicolas Heess, Marcus Hutter, Lars Holger Buesing, Remi Munos

Credit assignment in reinforcement learning is the problem of measuring an action’s influence on future rewards.

counterfactual valid

Paper
Add Code

Counterfactual Credit Assignment in Model-Free Reinforcement Learning

no code implementations • 18 Nov 2020 • Thomas Mesnard, Théophane Weber, Fabio Viola, Shantanu Thakoor, Alaa Saade, Anna Harutyunyan, Will Dabney, Tom Stepleton, Nicolas Heess, Arthur Guez, Éric Moulines, Marcus Hutter, Lars Buesing, Rémi Munos

Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards.

counterfactual reinforcement-learning +1

Paper
Add Code

Useful Policy Invariant Shaping from Arbitrary Advice

no code implementations • 2 Nov 2020 • Paniz Behboudian, Yash Satsangi, Matthew E. Taylor, Anna Harutyunyan, Michael Bowling

Furthermore, if the reward is constructed from a potential function, the optimal policy is guaranteed to be unaltered.

Paper
Add Code

Hindsight Credit Assignment

1 code implementation • NeurIPS 2019 • Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Azar, Bilal Piot, Nicolas Heess, Hado van Hasselt, Greg Wayne, Satinder Singh, Doina Precup, Remi Munos

We consider the problem of efficient credit assignment in reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Conditional Importance Sampling for Off-Policy Learning

no code implementations • 16 Oct 2019 • Mark Rowland, Anna Harutyunyan, Hado van Hasselt, Diana Borsa, Tom Schaul, Rémi Munos, Will Dabney

We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

The Termination Critic

no code implementations • 26 Feb 2019 • Anna Harutyunyan, Will Dabney, Diana Borsa, Nicolas Heess, Remi Munos, Doina Precup

In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents.

Paper
Add Code

Learning with Options that Terminate Off-Policy

no code implementations • 10 Nov 2017 • Anna Harutyunyan, Peter Vrancx, Pierre-Luc Bacon, Doina Precup, Ann Nowe

Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient.

Paper
Add Code

Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets

no code implementations • 22 Aug 2017 • Denis Steckelmacher, Diederik M. Roijers, Anna Harutyunyan, Peter Vrancx, Hélène Plisnier, Ann Nowé

Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Safe and Efficient Off-Policy Reinforcement Learning

3 code implementations • NeurIPS 2016 • Rémi Munos, Tom Stepleton, Anna Harutyunyan, Marc G. Bellemare

In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning.

Atari Games reinforcement-learning +1

Paper
Code

Q($λ$) with Off-Policy Corrections

no code implementations • 16 Feb 2016 • Anna Harutyunyan, Marc G. Bellemare, Tom Stepleton, Remi Munos

We propose and analyze an alternate approach to off-policy multi-step temporal difference learning, in which off-policy returns are corrected with the current Q-function in terms of rewards, rather than with the target policy in terms of transition probabilities.

Paper
Add Code

Off-Policy Reward Shaping with Ensembles

no code implementations • 11 Feb 2015 • Anna Harutyunyan, Tim Brys, Peter Vrancx, Ann Nowe

While PBRS is proven to always preserve optimal policies, its effect on learning speed is determined by the quality of its potential function, which, in turn, depends on both the underlying heuristic and the scale.

Paper
Add Code

Off-Policy Shaping Ensembles in Reinforcement Learning

no code implementations • 21 May 2014 • Anna Harutyunyan, Tim Brys, Peter Vrancx, Ann Nowe

Recent advances of gradient temporal-difference methods allow to learn off-policy multiple value functions in parallel with- out sacrificing convergence guarantees or computational efficiency.

Computational Efficiency reinforcement-learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.