no code implementations • ICML 2020 • Remi Munos, Julien Perolat, Jean-Baptiste Lespiau, Mark Rowland, Bart De Vylder, Marc Lanctot, Finbarr Timbers, Daniel Hennes, Shayegan Omidshafiei, Audrunas Gruslys, Mohammad Gheshlaghi Azar, Edward Lockhart, Karl Tuyls
We introduce and analyze a class of algorithms, called Mirror Ascent against an Improved Opponent (MAIO), for computing Nash equilibria in two-player zero-sum games, both in normal form and in sequential imperfect information form.
no code implementations • 12 Jun 2022 • Samuel Sokota, Ryan D'Orazio, J. Zico Kolter, Nicolas Loizou, Marc Lanctot, Ioannis Mitliagkas, Noam Brown, Christian Kroer
Moreover, applied as a tabular Nash equilibrium solver via self-play, we show empirically that MMD produces results competitive with CFR in both normal-form and extensive-form games with full feedback (this is the first time that a standard RL algorithm has done so) and also that MMD empirically converges in black-box feedback settings.
no code implementations • 8 Jun 2022 • Stephen Mcaleer, Gabriele Farina, Marc Lanctot, Tuomas Sandholm
We show that the variance of the estimated regret of a tabular version of ESCHER with an oracle value function is significantly lower than that of outcome sampling MCCFR and tabular DREAM with an oracle value function.
no code implementations • 31 May 2022 • SiQi Liu, Marc Lanctot, Luke Marris, Nicolas Heess
Learning to play optimally against any mixture over a diverse set of strategies is of important practical interests in competitive games.
1 code implementation • 24 May 2022 • Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, Amy R. Greenwald
Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria.
no code implementations • 19 Jan 2022 • Stephen Mcaleer, Kevin Wang, John Lanier, Marc Lanctot, Pierre Baldi, Tuomas Sandholm, Roy Fox
PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next.
no code implementations • 6 Dec 2021 • Martin Schmid, Matej Moravcik, Neil Burch, Rudolf Kadlec, Josh Davidson, Kevin Waugh, Nolan Bard, Finbarr Timbers, Marc Lanctot, Zach Holland, Elnaz Davoodi, Alden Christianson, Michael Bowling
Games have a long history of serving as a benchmark for progress in artificial intelligence.
no code implementations • NeurIPS 2021 • Abhinav Gupta, Marc Lanctot, Angeliki Lazaridou
In this work, our goal is to train agents that can coordinate with seen, unseen as well as human partners in a multi-agent communication environment involving natural language.
1 code implementation • 17 Jun 2021 • Luke Marris, Paul Muller, Marc Lanctot, Karl Tuyls, Thore Graepel
Two-player, constant-sum games are well studied in the literature, but there has been limited progress outside of this setting.
no code implementations • ICLR Workshop Learning_to_Learn 2021 • Abhinav Gupta, Angeliki Lazaridou, Marc Lanctot
Recent works have shown remarkable progress in training artificial agents to understand natural language but are focused on using large amounts of raw data involving huge compute requirements.
1 code implementation • 13 Feb 2021 • Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, Amy Greenwald
Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria.
2 code implementations • 11 Jan 2021 • Samuel Sokota, Edward Lockhart, Finbarr Timbers, Elnaz Davoodi, Ryan D'Orazio, Neil Burch, Martin Schmid, Michael Bowling, Marc Lanctot
While this choice precludes CAPI from scaling to games as large as Hanabi, empirical results demonstrate that, on the games to which CAPI does scale, it is capable of discovering optimal joint policies even when other modern multi-agent reinforcement learning algorithms are unable to do so.
1 code implementation • 10 Dec 2020 • Dustin Morrill, Ryan D'Orazio, Reca Sarfati, Marc Lanctot, James R. Wright, Amy Greenwald, Michael Bowling
This approach also leads to a game-theoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium.
no code implementations • ICLR 2019 • Yoram Bachrach, Richard Everett, Edward Hughes, Angeliki Lazaridou, Joel Z. Leibo, Marc Lanctot, Michael Johanson, Wojciech M. Czarnecki, Thore Graepel
When autonomous agents interact in the same environment, they must often cooperate to achieve their goals.
no code implementations • 27 Aug 2020 • Audrūnas Gruslys, Marc Lanctot, Rémi Munos, Finbarr Timbers, Martin Schmid, Julien Perolat, Dustin Morrill, Vinicius Zambaldi, Jean-Baptiste Lespiau, John Schultz, Mohammad Gheshlaghi Azar, Michael Bowling, Karl Tuyls
In this paper, we describe a general model-free RL method for no-regret learning based on repeated reconsideration of past behavior.
2 code implementations • NeurIPS 2020 • Thomas Anthony, Tom Eccles, Andrea Tacchetti, János Kramár, Ian Gemp, Thomas C. Hudson, Nicolas Porcel, Marc Lanctot, Julien Pérolat, Richard Everett, Roman Werpachowski, Satinder Singh, Thore Graepel, Yoram Bachrach
It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms.
no code implementations • 20 Apr 2020 • Finbarr Timbers, Nolan Bard, Edward Lockhart, Marc Lanctot, Martin Schmid, Neil Burch, Julian Schrittwieser, Thomas Hubert, Michael Bowling
In prior games research, agent evaluation often focused on the in-practice game outcomes.
no code implementations • 19 Feb 2020 • Julien Perolat, Remi Munos, Jean-Baptiste Lespiau, Shayegan Omidshafiei, Mark Rowland, Pedro Ortega, Neil Burch, Thomas Anthony, David Balduzzi, Bart De Vylder, Georgios Piliouras, Marc Lanctot, Karl Tuyls
In this paper we investigate the Follow the Regularized Leader dynamics in sequential imperfect information games (IIG).
1 code implementation • ICLR 2020 • Paul Muller, Shayegan Omidshafiei, Mark Rowland, Karl Tuyls, Julien Perolat, Si-Qi Liu, Daniel Hennes, Luke Marris, Marc Lanctot, Edward Hughes, Zhe Wang, Guy Lever, Nicolas Heess, Thore Graepel, Remi Munos
This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO).
13 code implementations • 26 Aug 2019 • Marc Lanctot, Edward Lockhart, Jean-Baptiste Lespiau, Vinicius Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes, Ivo Danihelka, Jonah Ryan-Davis
OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
1 code implementation • 1 Jun 2019 • Daniel Hennes, Dustin Morrill, Shayegan Omidshafiei, Remi Munos, Julien Perolat, Marc Lanctot, Audrunas Gruslys, Jean-Baptiste Lespiau, Paavo Parmas, Edgar Duenez-Guzman, Karl Tuyls
Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning.
no code implementations • 13 Mar 2019 • Edward Lockhart, Marc Lanctot, Julien Pérolat, Jean-Baptiste Lespiau, Dustin Morrill, Finbarr Timbers, Karl Tuyls
In this paper, we present exploitability descent, a new algorithm to compute approximate equilibria in two-player zero-sum extensive-form games with imperfect information, by direct policy optimization against worst-case opponents.
1 code implementation • 4 Mar 2019 • Shayegan Omidshafiei, Christos Papadimitriou, Georgios Piliouras, Karl Tuyls, Mark Rowland, Jean-Baptiste Lespiau, Wojciech M. Czarnecki, Marc Lanctot, Julien Perolat, Remi Munos
We introduce {\alpha}-Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs).
no code implementations • 2 Mar 2019 • Joel Z. Leibo, Edward Hughes, Marc Lanctot, Thore Graepel
Evolution has produced a multi-scale mosaic of interacting adaptive units.
2 code implementations • 1 Feb 2019 • Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, Michael Bowling
From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making.
1 code implementation • NeurIPS 2018 • Sriram Srinivasan, Marc Lanctot, Vinicius Zambaldi, Julien Perolat, Karl Tuyls, Remi Munos, Michael Bowling
Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence.
no code implementations • 9 Sep 2018 • Martin Schmid, Neil Burch, Marc Lanctot, Matej Moravcik, Rudolf Kadlec, Michael Bowling
The new formulation allows estimates to be bootstrapped from other estimates within the same episode, propagating the benefits of baselines along the sampled trajectory; the estimates remain unbiased even when bootstrapping from other estimates.
1 code implementation • ICLR 2018 • Kris Cao, Angeliki Lazaridou, Marc Lanctot, Joel Z. Leibo, Karl Tuyls, Stephen Clark
We also study communication behaviour in a setting where one agent interacts with agents in a community with different levels of prosociality and show how agent identifiability can aid negotiation.
53 code implementations • 5 Dec 2017 • David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent SIfre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis
The game of chess is the most widely-studied domain in the history of artificial intelligence.
Ranked #1 on
Game of Go
on ELO Ratings
1 code implementation • NeurIPS 2017 • Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Perolat, David Silver, Thore Graepel
To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL).
6 code implementations • 16 Jun 2017 • Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel
We study the problem of cooperative multi-agent reinforcement learning with a single joint reward signal.
5 code implementations • 12 Apr 2017 • Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys
We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism.
3 code implementations • 10 Feb 2017 • Joel Z. Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, Thore Graepel
We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions.
2 code implementations • NeurIPS 2016 • Audrūnas Gruslys, Remi Munos, Ivo Danihelka, Marc Lanctot, Alex Graves
We propose a novel approach to reduce memory consumption of the backpropagation through time (BPTT) algorithm when training recurrent neural networks (RNNs).
no code implementations • 8 Jun 2016 • Chrisantha Fernando, Dylan Banarse, Malcolm Reynolds, Frederic Besse, David Pfau, Max Jaderberg, Marc Lanctot, Daan Wierstra
In this work we introduce a differentiable version of the Compositional Pattern Producing Network, called the DPPN.
68 code implementations • 20 Nov 2015 • Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas
In recent years there have been many successes of using deep representations in reinforcement learning.
Ranked #2 on
Atari Games
on Atari 2600 Pong
no code implementations • 2 Jun 2014 • Marc Lanctot, Mark H. M. Winands, Tom Pepels, Nathan R. Sturtevant
In recent years, combining ideas from traditional minimax search in MCTS has been shown to be advantageous in some domains, such as Lines of Action, Amazons, and Breakthrough.
1 code implementation • 18 Jan 2014 • Marc Ponsen, Steven de Jong, Marc Lanctot
Our second contribution relates to the observation that a NE is not a best response against players that are not playing a NE.
Computer Science and Game Theory
no code implementations • NeurIPS 2013 • Viliam Lisy, Vojta Kovarik, Marc Lanctot, Branislav Bosansky
In this paper, we study Monte Carlo tree search (MCTS) in zero-sum extensive-form games with perfect information and simultaneous moves.
no code implementations • NeurIPS 2012 • Neil Burch, Marc Lanctot, Duane Szafron, Richard G. Gibson
In this paper, we present a new MCCFR algorithm, Average Strategy Sampling (AS), that samples a subset of the player's actions according to the player's average strategy.
no code implementations • NeurIPS 2011 • Joel Veness, Marc Lanctot, Michael Bowling
Monte-Carlo Tree Search (MCTS) has proven to be a powerful, generic planning technique for decision-making in single-agent and adversarial environments.
1 code implementation • NeurIPS 2009 • Marc Lanctot, Kevin Waugh, Martin Zinkevich, Michael Bowling
In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling.