Search Results for author: Mohammad Gheshlaghi Azar

Found 19 papers, 8 papers with code

Fast computation of Nash Equilibria in Imperfect Information Games

no code implementations ICML 2020 Remi Munos, Julien Perolat, Jean-Baptiste Lespiau, Mark Rowland, Bart De Vylder, Marc Lanctot, Finbarr Timbers, Daniel Hennes, Shayegan Omidshafiei, Audrunas Gruslys, Mohammad Gheshlaghi Azar, Edward Lockhart, Karl Tuyls

We introduce and analyze a class of algorithms, called Mirror Ascent against an Improved Opponent (MAIO), for computing Nash equilibria in two-player zero-sum games, both in normal form and in sequential imperfect information form.

Mine Your Own vieW: Self-Supervised Learning Through Across-Sample Prediction

1 code implementation19 Feb 2021 Mehdi Azabou, Mohammad Gheshlaghi Azar, Ran Liu, Chi-Heng Lin, Erik C. Johnson, Kiran Bhaskaran-Nair, Max Dabagia, Keith B. Hengen, William Gray-Roncal, Michal Valko, Eva L. Dyer

By learning to predict the latent representation of similar samples, we show that it is possible to learn good representations in new domains where augmentations are still limited.

Self-Supervised Learning

Bootstrapped Representation Learning on Graphs

2 code implementations12 Feb 2021 Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Rémi Munos, Petar Veličković, Michal Valko

Current state-of-the-art self-supervised learning methods for graph neural networks (GNNs) are based on contrastive learning.

Contrastive Learning Representation Learning +1

Towards Consistent Performance on Atari using Expert Demonstrations

no code implementations ICLR 2019 Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin

Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games.

Atari Games

Neural Predictive Belief Representations

no code implementations ICLR 2019 Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo A. Pires, Rémi Munos

In partially observable domains it is important for the representation to encode a belief state, a sufficient statistic of the observations seen so far.

Decision Making Unsupervised Representation Learning

Observe and Look Further: Achieving Consistent Performance on Atari

1 code implementation29 May 2018 Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin

Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games.

Montezuma's Revenge

Noisy Networks for Exploration

11 code implementations ICLR 2018 Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg

We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration.

Atari Games Efficient Exploration

The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning

no code implementations ICLR 2018 Audrunas Gruslys, Will Dabney, Mohammad Gheshlaghi Azar, Bilal Piot, Marc Bellemare, Remi Munos

Our first contribution is a new policy evaluation algorithm called Distributional Retrace, which brings multi-step off-policy updates to the distributional reinforcement learning setting.

Atari Games Distributional Reinforcement Learning

Minimax Regret Bounds for Reinforcement Learning

1 code implementation ICML 2017 Mohammad Gheshlaghi Azar, Ian Osband, Rémi Munos

We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs.

Convex Relaxation Regression: Black-Box Optimization of Smooth Functions by Learning Their Convex Envelopes

no code implementations5 Feb 2016 Mohammad Gheshlaghi Azar, Eva Dyer, Konrad Kording

Our approach enables the use of convex optimization tools to solve a class of non-convex optimization problems.

Online Stochastic Optimization under Correlated Bandit Feedback

no code implementations4 Feb 2014 Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback.

Stochastic Optimization

Sequential Transfer in Multi-armed Bandit with Finite Set of Models

no code implementations NeurIPS 2013 Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents.

Regret Bounds for Reinforcement Learning with Policy Advice

no code implementations5 May 2013 Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors.

Cannot find the paper you are looking for? You can Submit a new open access paper.