Search Results for author: Marc Lanctot

Found 52 papers, 28 papers with code

Fast computation of Nash Equilibria in Imperfect Information Games

no code implementations ICML 2020 Remi Munos, Julien Perolat, Jean-Baptiste Lespiau, Mark Rowland, Bart De Vylder, Marc Lanctot, Finbarr Timbers, Daniel Hennes, Shayegan Omidshafiei, Audrunas Gruslys, Mohammad Gheshlaghi Azar, Edward Lockhart, Karl Tuyls

We introduce and analyze a class of algorithms, called Mirror Ascent against an Improved Opponent (MAIO), for computing Nash equilibria in two-player zero-sum games, both in normal form and in sequential imperfect information form.

Easy as ABCs: Unifying Boltzmann Q-Learning and Counterfactual Regret Minimization

no code implementations19 Feb 2024 Luca D'Amico-Wong, Hugh Zhang, Marc Lanctot, David C. Parkes

We propose ABCs (Adaptive Branching through Child stationarity), a best-of-both-worlds algorithm combining Boltzmann Q-learning (BQL), a classic reinforcement learning algorithm for single-agent domains, and counterfactual regret minimization (CFR), a central algorithm for learning in multi-agent domains.

counterfactual OpenAI Gym +1

States as Strings as Strategies: Steering Language Models with Game-Theoretic Solvers

1 code implementation24 Jan 2024 Ian Gemp, Yoram Bachrach, Marc Lanctot, Roma Patel, Vibhavari Dasagi, Luke Marris, Georgios Piliouras, SiQi Liu, Karl Tuyls

A suitable model of the players, strategies, and payoffs associated with linguistic interactions (i. e., a binding to the conventional symbolic logic of game theory) would enable existing game-theoretic algorithms to provide strategic solutions in the space of language.

Imitation Learning

Neural Population Learning beyond Symmetric Zero-sum Games

no code implementations10 Jan 2024 SiQi Liu, Luke Marris, Marc Lanctot, Georgios Piliouras, Joel Z. Leibo, Nicolas Heess

We then introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated Equilibrium (CCE) of the game.

Transfer Learning

Evaluating Agents using Social Choice Theory

1 code implementation5 Dec 2023 Marc Lanctot, Kate Larson, Yoram Bachrach, Luke Marris, Zun Li, Avishkar Bhoopchand, Thomas Anthony, Brian Tanner, Anna Koop

We argue that many general evaluation problems can be viewed through the lens of voting theory.

Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning

1 code implementation2 Mar 2023 Marc Lanctot, John Schultz, Neil Burch, Max Olan Smith, Daniel Hennes, Thomas Anthony, Julien Perolat

Progress in fields of machine learning and adversarial planning has benefited significantly from benchmark domains, from checkers and the classic UCI data sets to Go and Diplomacy.

Decision Making Language Modelling

Learning not to Regret

no code implementations2 Mar 2023 David Sychrovský, Michal Šustr, Elnaz Davoodi, Michael Bowling, Marc Lanctot, Martin Schmid

As these similar games feature similar equilibra, we investigate a way to accelerate equilibrium finding on such a distribution.

Game Theoretic Rating in N-player general-sum games with Equilibria

no code implementations5 Oct 2022 Luke Marris, Marc Lanctot, Ian Gemp, Shayegan Omidshafiei, Stephen Mcaleer, Jerome Connor, Karl Tuyls, Thore Graepel

Rating strategies in a game is an important area of research in game theory and artificial intelligence, and can be applied to any real-world competitive or cooperative setting.

Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments

no code implementations22 Sep 2022 Ian Gemp, Thomas Anthony, Yoram Bachrach, Avishkar Bhoopchand, Kalesha Bullard, Jerome Connor, Vibhavari Dasagi, Bart De Vylder, Edgar Duenez-Guzman, Romuald Elie, Richard Everett, Daniel Hennes, Edward Hughes, Mina Khan, Marc Lanctot, Kate Larson, Guy Lever, SiQi Liu, Luke Marris, Kevin R. McKee, Paul Muller, Julien Perolat, Florian Strub, Andrea Tacchetti, Eugene Tarassov, Zhe Wang, Karl Tuyls

The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks.

reinforcement-learning Reinforcement Learning (RL)

A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games

3 code implementations12 Jun 2022 Samuel Sokota, Ryan D'Orazio, J. Zico Kolter, Nicolas Loizou, Marc Lanctot, Ioannis Mitliagkas, Noam Brown, Christian Kroer

This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm.

MuJoCo Games reinforcement-learning +1

ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret

1 code implementation8 Jun 2022 Stephen Mcaleer, Gabriele Farina, Marc Lanctot, Tuomas Sandholm

DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains a neural network on an estimated regret target that can have extremely high variance due to an importance sampling term inherited from Monte Carlo CFR (MCCFR).

counterfactual

Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games

no code implementations31 May 2022 SiQi Liu, Marc Lanctot, Luke Marris, Nicolas Heess

Learning to play optimally against any mixture over a diverse set of strategies is of important practical interests in competitive games.

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games: Corrections

1 code implementation24 May 2022 Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, Amy R. Greenwald

Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria.

counterfactual Decision Making

Anytime PSRO for Two-Player Zero-Sum Games

no code implementations19 Jan 2022 Stephen Mcaleer, Kevin Wang, John Lanier, Marc Lanctot, Pierre Baldi, Tuomas Sandholm, Roy Fox

PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next.

Multi-agent Reinforcement Learning reinforcement-learning +2

Dynamic population-based meta-learning for multi-agent communication with natural language

no code implementations NeurIPS 2021 Abhinav Gupta, Marc Lanctot, Angeliki Lazaridou

In this work, our goal is to train agents that can coordinate with seen, unseen as well as human partners in a multi-agent communication environment involving natural language.

Attribute Meta-Learning +1

Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers

1 code implementation17 Jun 2021 Luke Marris, Paul Muller, Marc Lanctot, Karl Tuyls, Thore Graepel

Two-player, constant-sum games are well studied in the literature, but there has been limited progress outside of this setting.

Meta Learning for Multi-agent Communication

no code implementations ICLR Workshop Learning_to_Learn 2021 Abhinav Gupta, Angeliki Lazaridou, Marc Lanctot

Recent works have shown remarkable progress in training artificial agents to understand natural language but are focused on using large amounts of raw data involving huge compute requirements.

Meta-Learning Meta Reinforcement Learning

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

1 code implementation13 Feb 2021 Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, Amy Greenwald

Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria.

counterfactual Decision Making

Solving Common-Payoff Games with Approximate Policy Iteration

2 code implementations11 Jan 2021 Samuel Sokota, Edward Lockhart, Finbarr Timbers, Elnaz Davoodi, Ryan D'Orazio, Neil Burch, Martin Schmid, Michael Bowling, Marc Lanctot

While this choice precludes CAPI from scaling to games as large as Hanabi, empirical results demonstrate that, on the games to which CAPI does scale, it is capable of discovering optimal joint policies even when other modern multi-agent reinforcement learning algorithms are unable to do so.

Multi-agent Reinforcement Learning reinforcement-learning +1

Hindsight and Sequential Rationality of Correlated Play

1 code implementation10 Dec 2020 Dustin Morrill, Ryan D'Orazio, Reca Sarfati, Marc Lanctot, James R. Wright, Amy Greenwald, Michael Bowling

This approach also leads to a game-theoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium.

counterfactual Decision Making +1

Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent

no code implementations13 Mar 2019 Edward Lockhart, Marc Lanctot, Julien Pérolat, Jean-Baptiste Lespiau, Dustin Morrill, Finbarr Timbers, Karl Tuyls

In this paper, we present exploitability descent, a new algorithm to compute approximate equilibria in two-player zero-sum extensive-form games with imperfect information, by direct policy optimization against worst-case opponents.

counterfactual

α-Rank: Multi-Agent Evaluation by Evolution

1 code implementation4 Mar 2019 Shayegan Omidshafiei, Christos Papadimitriou, Georgios Piliouras, Karl Tuyls, Mark Rowland, Jean-Baptiste Lespiau, Wojciech M. Czarnecki, Marc Lanctot, Julien Perolat, Remi Munos

We introduce {\alpha}-Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs).

Mathematical Proofs

Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines

no code implementations9 Sep 2018 Martin Schmid, Neil Burch, Marc Lanctot, Matej Moravcik, Rudolf Kadlec, Michael Bowling

The new formulation allows estimates to be bootstrapped from other estimates within the same episode, propagating the benefits of baselines along the sampled trajectory; the estimates remain unbiased even when bootstrapping from other estimates.

counterfactual

Emergent Communication through Negotiation

1 code implementation ICLR 2018 Kris Cao, Angeliki Lazaridou, Marc Lanctot, Joel Z. Leibo, Karl Tuyls, Stephen Clark

We also study communication behaviour in a setting where one agent interacts with agents in a community with different levels of prosociality and show how agent identifiability can aid negotiation.

Multi-agent Reinforcement Learning

A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning

1 code implementation NeurIPS 2017 Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Perolat, David Silver, Thore Graepel

To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL).

reinforcement-learning Reinforcement Learning (RL)

Deep Q-learning from Demonstrations

5 code implementations12 Apr 2017 Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys

We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism.

Imitation Learning Q-Learning +1

Multi-agent Reinforcement Learning in Sequential Social Dilemmas

4 code implementations10 Feb 2017 Joel Z. Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, Thore Graepel

We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions.

Multi-agent Reinforcement Learning reinforcement-learning +1

Memory-Efficient Backpropagation Through Time

2 code implementations NeurIPS 2016 Audrūnas Gruslys, Remi Munos, Ivo Danihelka, Marc Lanctot, Alex Graves

We propose a novel approach to reduce memory consumption of the backpropagation through time (BPTT) algorithm when training recurrent neural networks (RNNs).

Monte Carlo Tree Search with Heuristic Evaluations using Implicit Minimax Backups

no code implementations2 Jun 2014 Marc Lanctot, Mark H. M. Winands, Tom Pepels, Nathan R. Sturtevant

In recent years, combining ideas from traditional minimax search in MCTS has been shown to be advantageous in some domains, such as Lines of Action, Amazons, and Breakthrough.

Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling

1 code implementation18 Jan 2014 Marc Ponsen, Steven de Jong, Marc Lanctot

Our second contribution relates to the observation that a NE is not a best response against players that are not playing a NE.

Computer Science and Game Theory

Convergence of Monte Carlo Tree Search in Simultaneous Move Games

no code implementations NeurIPS 2013 Viliam Lisy, Vojta Kovarik, Marc Lanctot, Branislav Bosansky

In this paper, we study Monte Carlo tree search (MCTS) in zero-sum extensive-form games with perfect information and simultaneous moves.

Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions

no code implementations NeurIPS 2012 Neil Burch, Marc Lanctot, Duane Szafron, Richard G. Gibson

In this paper, we present a new MCCFR algorithm, Average Strategy Sampling (AS), that samples a subset of the player's actions according to the player's average strategy.

counterfactual

Variance Reduction in Monte-Carlo Tree Search

no code implementations NeurIPS 2011 Joel Veness, Marc Lanctot, Michael Bowling

Monte-Carlo Tree Search (MCTS) has proven to be a powerful, generic planning technique for decision-making in single-agent and adversarial environments.

Decision Making

Monte Carlo Sampling for Regret Minimization in Extensive Games

1 code implementation NeurIPS 2009 Marc Lanctot, Kevin Waugh, Martin Zinkevich, Michael Bowling

In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling.

counterfactual Decision Making

Cannot find the paper you are looking for? You can Submit a new open access paper.