Search Results for author: Anna Harutyunyan

Found 16 papers, 2 papers with code

Bootstrapped Representations in Reinforcement Learning

no code implementations16 Jun 2023 Charline Le Lan, Stephen Tu, Mark Rowland, Anna Harutyunyan, Rishabh Agarwal, Marc G. Bellemare, Will Dabney

In this paper, we address this gap and provide a theoretical characterization of the state representation learnt by temporal difference learning (Sutton, 1988).

Auxiliary Learning reinforcement-learning +1

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

no code implementations29 May 2023 Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko

Multi-step learning applies lookahead over multiple time steps and has proved valuable in policy evaluation settings.

An Analysis of Quantile Temporal-Difference Learning

no code implementations11 Jan 2023 Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney

We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning.

Distributional Reinforcement Learning reinforcement-learning +1

On the Expressivity of Markov Reward

no code implementations NeurIPS 2021 David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh

We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists.

Useful Policy Invariant Shaping from Arbitrary Advice

no code implementations2 Nov 2020 Paniz Behboudian, Yash Satsangi, Matthew E. Taylor, Anna Harutyunyan, Michael Bowling

Furthermore, if the reward is constructed from a potential function, the optimal policy is guaranteed to be unaltered.

The Termination Critic

no code implementations26 Feb 2019 Anna Harutyunyan, Will Dabney, Diana Borsa, Nicolas Heess, Remi Munos, Doina Precup

In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents.

Learning with Options that Terminate Off-Policy

no code implementations10 Nov 2017 Anna Harutyunyan, Peter Vrancx, Pierre-Luc Bacon, Doina Precup, Ann Nowe

Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient.

Q($λ$) with Off-Policy Corrections

no code implementations16 Feb 2016 Anna Harutyunyan, Marc G. Bellemare, Tom Stepleton, Remi Munos

We propose and analyze an alternate approach to off-policy multi-step temporal difference learning, in which off-policy returns are corrected with the current Q-function in terms of rewards, rather than with the target policy in terms of transition probabilities.

Off-Policy Reward Shaping with Ensembles

no code implementations11 Feb 2015 Anna Harutyunyan, Tim Brys, Peter Vrancx, Ann Nowe

While PBRS is proven to always preserve optimal policies, its effect on learning speed is determined by the quality of its potential function, which, in turn, depends on both the underlying heuristic and the scale.

Off-Policy Shaping Ensembles in Reinforcement Learning

no code implementations21 May 2014 Anna Harutyunyan, Tim Brys, Peter Vrancx, Ann Nowe

Recent advances of gradient temporal-difference methods allow to learn off-policy multiple value functions in parallel with- out sacrificing convergence guarantees or computational efficiency.

Computational Efficiency reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.