Search Results for author: Mohammad Gheshlaghi Azar

Found 28 papers, 10 papers with code

Fast computation of Nash Equilibria in Imperfect Information Games

no code implementations • ICML 2020 • Remi Munos, Julien Perolat, Jean-Baptiste Lespiau, Mark Rowland, Bart De Vylder, Marc Lanctot, Finbarr Timbers, Daniel Hennes, Shayegan Omidshafiei, Audrunas Gruslys, Mohammad Gheshlaghi Azar, Edward Lockhart, Karl Tuyls

We introduce and analyze a class of algorithms, called Mirror Ascent against an Improved Opponent (MAIO), for computing Nash equilibria in two-player zero-sum games, both in normal form and in sequential imperfect information form.

Paper
Add Code

Nash Learning from Human Feedback

no code implementations • 1 Dec 2023 • Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot

We term this approach Nash learning from human feedback (NLHF).

Text Summarization

Paper
Add Code

A General Theoretical Paradigm to Understand Learning from Human Preferences

1 code implementation • 18 Oct 2023 • Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos

In particular we derive a new general objective called $\Psi$PO for learning from human preferences that is expressed in terms of pairwise preferences and therefore bypasses both approximations.

1,612

Paper
Code

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

1 code implementation • 22 May 2023 • Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms.

regression Reinforcement Learning (RL)

Paper
Code

An Analysis of Quantile Temporal-Difference Learning

no code implementations • 11 Jan 2023 • Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney

We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning.

Distributional Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Understanding Self-Predictive Learning for Reinforcement Learning

no code implementations • 6 Dec 2022 • Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko

We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

BYOL-Explore: Exploration by Bootstrapped Prediction

no code implementations • 16 Jun 2022 • Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pîslar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot

We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments.

Paper
Add Code

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

no code implementations • 27 May 2022 • Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári

In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity

1 code implementation • NeurIPS 2021 • Ran Liu, Mehdi Azabou, Max Dabagia, Chi-Heng Lin, Mohammad Gheshlaghi Azar, Keith B. Hengen, Michal Valko, Eva L. Dyer

Our approach combines a generative modeling framework with an instance-specific alignment loss that tries to maximize the representational similarity between transformed views of the input (brain state).

Paper
Code

Bootstrapped Representation Learning on Graphs

no code implementations • ICLR Workshop GTRL 2021 • Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Remi Munos, Petar Veličković, Michal Valko

Current state-of-the-art self-supervised learning methods for graph neural networks are based on contrastive learning.

Contrastive Learning Representation Learning +1

Paper
Add Code

Mine Your Own vieW: Self-Supervised Learning Through Across-Sample Prediction

1 code implementation • 19 Feb 2021 • Mehdi Azabou, Mohammad Gheshlaghi Azar, Ran Liu, Chi-Heng Lin, Erik C. Johnson, Kiran Bhaskaran-Nair, Max Dabagia, Bernardo Avila-Pires, Lindsey Kitchell, Keith B. Hengen, William Gray-Roncal, Michal Valko, Eva L. Dyer

State-of-the-art methods for self-supervised learning (SSL) build representations by maximizing the similarity between different transformed "views" of a sample.

Self-Supervised Learning

Paper
Code

Large-Scale Representation Learning on Graphs via Bootstrapping

3 code implementations • ICLR 2022 • Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Mehdi Azabou, Eva L. Dyer, Rémi Munos, Petar Veličković, Michal Valko

To address these challenges, we introduce Bootstrapped Graph Latents (BGRL) - a graph representation learning method that learns by predicting alternative augmentations of the input.

Contrastive Learning Graph Representation Learning +1

Paper
Code

Geometric Entropic Exploration

no code implementations • 6 Jan 2021 • Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Alaa Saade, Shantanu Thakoor, Bilal Piot, Bernardo Avila Pires, Michal Valko, Thomas Mesnard, Tor Lattimore, Rémi Munos

Exploration is essential for solving complex Reinforcement Learning (RL) tasks.

Reinforcement Learning (RL)

Paper
Add Code

Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning

8 code implementations • NeurIPS 2020 • Jean-bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Remi Munos, Michal Valko

From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view.

Representation Learning Self-Supervised Learning

3,081

Paper
Code

The Advantage Regret-Matching Actor-Critic

no code implementations • 27 Aug 2020 • Audrūnas Gruslys, Marc Lanctot, Rémi Munos, Finbarr Timbers, Martin Schmid, Julien Perolat, Dustin Morrill, Vinicius Zambaldi, Jean-Baptiste Lespiau, John Schultz, Mohammad Gheshlaghi Azar, Michael Bowling, Karl Tuyls

In this paper, we describe a general model-free RL method for no-regret learning based on repeated reconsideration of past behavior.

counterfactual Reinforcement Learning (RL)

Paper
Add Code

Bootstrap your own latent: A new approach to self-supervised Learning

31 code implementations • 13 Jun 2020 • Jean-bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko

From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view.

Ranked #2 on Self-Supervised Person Re-Identification on SYSU-30k

Representation Learning Self-Supervised Image Classification +3

12,788

Paper
Code

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning

no code implementations • ICML 2020 • Daniel Guo, Bernardo Avila Pires, Bilal Piot, Jean-bastien Grill, Florent Altché, Rémi Munos, Mohammad Gheshlaghi Azar

These latent embeddings are themselves trained to be predictive of the aforementioned representations.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Towards Consistent Performance on Atari using Expert Demonstrations

no code implementations • ICLR 2019 • Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin

Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games.

Atari Games Reinforcement Learning (RL)

Paper
Add Code

World Discovery Models

1 code implementation • 20 Feb 2019 • Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo Avila Pires, Jean-bastien Grill, Florent Altché, Rémi Munos

As humans we are driven by a strong desire for seeking novelty in our world.

Paper
Code

Neural Predictive Belief Representations

no code implementations • 15 Nov 2018 • Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo A. Pires, Rémi Munos

In partially observable domains it is important for the representation to encode a belief state, a sufficient statistic of the observations seen so far.

Decision Making Representation Learning

Paper
Add Code

Observe and Look Further: Achieving Consistent Performance on Atari

no code implementations • 29 May 2018 • Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin

Montezuma's Revenge Reinforcement Learning (RL)

Paper
Add Code

Noisy Networks for Exploration

15 code implementations • ICLR 2018 • Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg

We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration.

Ranked #1 on Atari Games on Atari 2600 Surround

Atari Games Efficient Exploration +2

2,534

Paper
Code

The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning

no code implementations • ICLR 2018 • Audrunas Gruslys, Will Dabney, Mohammad Gheshlaghi Azar, Bilal Piot, Marc Bellemare, Remi Munos

Our first contribution is a new policy evaluation algorithm called Distributional Retrace, which brings multi-step off-policy updates to the distributional reinforcement learning setting.

Ranked #6 on Atari Games on Atari 2600 Crazy Climber

Atari Games Distributional Reinforcement Learning +1

Paper
Add Code

Minimax Regret Bounds for Reinforcement Learning

1 code implementation • ICML 2017 • Mohammad Gheshlaghi Azar, Ian Osband, Rémi Munos

We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Convex Relaxation Regression: Black-Box Optimization of Smooth Functions by Learning Their Convex Envelopes

no code implementations • 5 Feb 2016 • Mohammad Gheshlaghi Azar, Eva Dyer, Konrad Kording

Our approach enables the use of convex optimization tools to solve a class of non-convex optimization problems.

regression

Paper
Add Code

Online Stochastic Optimization under Correlated Bandit Feedback

no code implementations • 4 Feb 2014 • Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback.

Stochastic Optimization

Paper
Add Code

Sequential Transfer in Multi-armed Bandit with Finite Set of Models

no code implementations • NeurIPS 2013 • Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Regret Bounds for Reinforcement Learning with Policy Advice

no code implementations • 5 May 2013 • Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.