Search Results for author: Marc Lanctot

Found 42 papers, 22 papers with code

Fast computation of Nash Equilibria in Imperfect Information Games

no code implementations ICML 2020 Remi Munos, Julien Perolat, Jean-Baptiste Lespiau, Mark Rowland, Bart De Vylder, Marc Lanctot, Finbarr Timbers, Daniel Hennes, Shayegan Omidshafiei, Audrunas Gruslys, Mohammad Gheshlaghi Azar, Edward Lockhart, Karl Tuyls

We introduce and analyze a class of algorithms, called Mirror Ascent against an Improved Opponent (MAIO), for computing Nash equilibria in two-player zero-sum games, both in normal form and in sequential imperfect information form.

A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games

no code implementations12 Jun 2022 Samuel Sokota, Ryan D'Orazio, J. Zico Kolter, Nicolas Loizou, Marc Lanctot, Ioannis Mitliagkas, Noam Brown, Christian Kroer

Moreover, applied as a tabular Nash equilibrium solver via self-play, we show empirically that MMD produces results competitive with CFR in both normal-form and extensive-form games with full feedback (this is the first time that a standard RL algorithm has done so) and also that MMD empirically converges in black-box feedback settings.

MuJoCo Games

ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret

no code implementations8 Jun 2022 Stephen Mcaleer, Gabriele Farina, Marc Lanctot, Tuomas Sandholm

We show that the variance of the estimated regret of a tabular version of ESCHER with an oracle value function is significantly lower than that of outcome sampling MCCFR and tabular DREAM with an oracle value function.

Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games

no code implementations31 May 2022 SiQi Liu, Marc Lanctot, Luke Marris, Nicolas Heess

Learning to play optimally against any mixture over a diverse set of strategies is of important practical interests in competitive games.

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games: Corrections

1 code implementation24 May 2022 Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, Amy R. Greenwald

Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria.

Decision Making

Anytime PSRO for Two-Player Zero-Sum Games

no code implementations19 Jan 2022 Stephen Mcaleer, Kevin Wang, John Lanier, Marc Lanctot, Pierre Baldi, Tuomas Sandholm, Roy Fox

PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next.

Multi-agent Reinforcement Learning reinforcement-learning

Dynamic population-based meta-learning for multi-agent communication with natural language

no code implementations NeurIPS 2021 Abhinav Gupta, Marc Lanctot, Angeliki Lazaridou

In this work, our goal is to train agents that can coordinate with seen, unseen as well as human partners in a multi-agent communication environment involving natural language.

Meta-Learning Text Generation

Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers

1 code implementation17 Jun 2021 Luke Marris, Paul Muller, Marc Lanctot, Karl Tuyls, Thore Graepel

Two-player, constant-sum games are well studied in the literature, but there has been limited progress outside of this setting.

Meta Learning for Multi-agent Communication

no code implementations ICLR Workshop Learning_to_Learn 2021 Abhinav Gupta, Angeliki Lazaridou, Marc Lanctot

Recent works have shown remarkable progress in training artificial agents to understand natural language but are focused on using large amounts of raw data involving huge compute requirements.

Meta-Learning Meta Reinforcement Learning

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

1 code implementation13 Feb 2021 Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, Amy Greenwald

Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria.

Decision Making

Solving Common-Payoff Games with Approximate Policy Iteration

2 code implementations11 Jan 2021 Samuel Sokota, Edward Lockhart, Finbarr Timbers, Elnaz Davoodi, Ryan D'Orazio, Neil Burch, Martin Schmid, Michael Bowling, Marc Lanctot

While this choice precludes CAPI from scaling to games as large as Hanabi, empirical results demonstrate that, on the games to which CAPI does scale, it is capable of discovering optimal joint policies even when other modern multi-agent reinforcement learning algorithms are unable to do so.

Multi-agent Reinforcement Learning reinforcement-learning

Hindsight and Sequential Rationality of Correlated Play

1 code implementation10 Dec 2020 Dustin Morrill, Ryan D'Orazio, Reca Sarfati, Marc Lanctot, James R. Wright, Amy Greenwald, Michael Bowling

This approach also leads to a game-theoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium.

Decision Making

Neural Replicator Dynamics

1 code implementation1 Jun 2019 Daniel Hennes, Dustin Morrill, Shayegan Omidshafiei, Remi Munos, Julien Perolat, Marc Lanctot, Audrunas Gruslys, Jean-Baptiste Lespiau, Paavo Parmas, Edgar Duenez-Guzman, Karl Tuyls

Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning.

Policy Gradient Methods

Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent

no code implementations13 Mar 2019 Edward Lockhart, Marc Lanctot, Julien Pérolat, Jean-Baptiste Lespiau, Dustin Morrill, Finbarr Timbers, Karl Tuyls

In this paper, we present exploitability descent, a new algorithm to compute approximate equilibria in two-player zero-sum extensive-form games with imperfect information, by direct policy optimization against worst-case opponents.

α-Rank: Multi-Agent Evaluation by Evolution

1 code implementation4 Mar 2019 Shayegan Omidshafiei, Christos Papadimitriou, Georgios Piliouras, Karl Tuyls, Mark Rowland, Jean-Baptiste Lespiau, Wojciech M. Czarnecki, Marc Lanctot, Julien Perolat, Remi Munos

We introduce {\alpha}-Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs).

Mathematical Proofs

Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines

no code implementations9 Sep 2018 Martin Schmid, Neil Burch, Marc Lanctot, Matej Moravcik, Rudolf Kadlec, Michael Bowling

The new formulation allows estimates to be bootstrapped from other estimates within the same episode, propagating the benefits of baselines along the sampled trajectory; the estimates remain unbiased even when bootstrapping from other estimates.

Emergent Communication through Negotiation

1 code implementation ICLR 2018 Kris Cao, Angeliki Lazaridou, Marc Lanctot, Joel Z. Leibo, Karl Tuyls, Stephen Clark

We also study communication behaviour in a setting where one agent interacts with agents in a community with different levels of prosociality and show how agent identifiability can aid negotiation.

Multi-agent Reinforcement Learning reinforcement-learning

A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning

1 code implementation NeurIPS 2017 Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Perolat, David Silver, Thore Graepel

To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL).

reinforcement-learning

Deep Q-learning from Demonstrations

5 code implementations12 Apr 2017 Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys

We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism.

Decision Making Imitation Learning +1

Multi-agent Reinforcement Learning in Sequential Social Dilemmas

3 code implementations10 Feb 2017 Joel Z. Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, Thore Graepel

We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions.

Multi-agent Reinforcement Learning reinforcement-learning

Memory-Efficient Backpropagation Through Time

2 code implementations NeurIPS 2016 Audrūnas Gruslys, Remi Munos, Ivo Danihelka, Marc Lanctot, Alex Graves

We propose a novel approach to reduce memory consumption of the backpropagation through time (BPTT) algorithm when training recurrent neural networks (RNNs).

Monte Carlo Tree Search with Heuristic Evaluations using Implicit Minimax Backups

no code implementations2 Jun 2014 Marc Lanctot, Mark H. M. Winands, Tom Pepels, Nathan R. Sturtevant

In recent years, combining ideas from traditional minimax search in MCTS has been shown to be advantageous in some domains, such as Lines of Action, Amazons, and Breakthrough.

Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling

1 code implementation18 Jan 2014 Marc Ponsen, Steven de Jong, Marc Lanctot

Our second contribution relates to the observation that a NE is not a best response against players that are not playing a NE.

Computer Science and Game Theory

Convergence of Monte Carlo Tree Search in Simultaneous Move Games

no code implementations NeurIPS 2013 Viliam Lisy, Vojta Kovarik, Marc Lanctot, Branislav Bosansky

In this paper, we study Monte Carlo tree search (MCTS) in zero-sum extensive-form games with perfect information and simultaneous moves.

Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions

no code implementations NeurIPS 2012 Neil Burch, Marc Lanctot, Duane Szafron, Richard G. Gibson

In this paper, we present a new MCCFR algorithm, Average Strategy Sampling (AS), that samples a subset of the player's actions according to the player's average strategy.

Variance Reduction in Monte-Carlo Tree Search

no code implementations NeurIPS 2011 Joel Veness, Marc Lanctot, Michael Bowling

Monte-Carlo Tree Search (MCTS) has proven to be a powerful, generic planning technique for decision-making in single-agent and adversarial environments.

Decision Making

Monte Carlo Sampling for Regret Minimization in Extensive Games

1 code implementation NeurIPS 2009 Marc Lanctot, Kevin Waugh, Martin Zinkevich, Michael Bowling

In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling.

Decision Making

Cannot find the paper you are looking for? You can Submit a new open access paper.