Search Results for author: Marc Lanctot

Found 52 papers, 28 papers with code

Fast computation of Nash Equilibria in Imperfect Information Games

no code implementations • ICML 2020 • Remi Munos, Julien Perolat, Jean-Baptiste Lespiau, Mark Rowland, Bart De Vylder, Marc Lanctot, Finbarr Timbers, Daniel Hennes, Shayegan Omidshafiei, Audrunas Gruslys, Mohammad Gheshlaghi Azar, Edward Lockhart, Karl Tuyls

We introduce and analyze a class of algorithms, called Mirror Ascent against an Improved Opponent (MAIO), for computing Nash equilibria in two-player zero-sum games, both in normal form and in sequential imperfect information form.

Paper
Add Code

Easy as ABCs: Unifying Boltzmann Q-Learning and Counterfactual Regret Minimization

no code implementations • 19 Feb 2024 • Luca D'Amico-Wong, Hugh Zhang, Marc Lanctot, David C. Parkes

We propose ABCs (Adaptive Branching through Child stationarity), a best-of-both-worlds algorithm combining Boltzmann Q-learning (BQL), a classic reinforcement learning algorithm for single-agent domains, and counterfactual regret minimization (CFR), a central algorithm for learning in multi-agent domains.

counterfactual OpenAI Gym +1

Paper
Add Code

States as Strings as Strategies: Steering Language Models with Game-Theoretic Solvers

1 code implementation • 24 Jan 2024 • Ian Gemp, Yoram Bachrach, Marc Lanctot, Roma Patel, Vibhavari Dasagi, Luke Marris, Georgios Piliouras, SiQi Liu, Karl Tuyls

A suitable model of the players, strategies, and payoffs associated with linguistic interactions (i. e., a binding to the conventional symbolic logic of game theory) would enable existing game-theoretic algorithms to provide strategic solutions in the space of language.

Imitation Learning

4,013

Paper
Code

Neural Population Learning beyond Symmetric Zero-sum Games

no code implementations • 10 Jan 2024 • SiQi Liu, Luke Marris, Marc Lanctot, Georgios Piliouras, Joel Z. Leibo, Nicolas Heess

We then introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated Equilibrium (CCE) of the game.

Transfer Learning

Paper
Add Code

Evaluating Agents using Social Choice Theory

1 code implementation • 5 Dec 2023 • Marc Lanctot, Kate Larson, Yoram Bachrach, Luke Marris, Zun Li, Avishkar Bhoopchand, Thomas Anthony, Brian Tanner, Anna Koop

We argue that many general evaluation problems can be viewed through the lens of voting theory.

4,013

Paper
Code

Learning not to Regret

no code implementations • 2 Mar 2023 • David Sychrovský, Michal Šustr, Elnaz Davoodi, Michael Bowling, Marc Lanctot, Martin Schmid

As these similar games feature similar equilibra, we investigate a way to accelerate equilibrium finding on such a distribution.

Paper
Add Code

Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning

1 code implementation • 2 Mar 2023 • Marc Lanctot, John Schultz, Neil Burch, Max Olan Smith, Daniel Hennes, Thomas Anthony, Julien Perolat

Progress in fields of machine learning and adversarial planning has benefited significantly from benchmark domains, from checkers and the classic UCI data sets to Go and Diplomacy.

Decision Making Language Modelling

4,014

Paper
Code

Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

no code implementations • 1 Feb 2023 • Zun Li, Marc Lanctot, Kevin R. McKee, Luke Marris, Ian Gemp, Daniel Hennes, Paul Muller, Kate Larson, Yoram Bachrach, Michael P. Wellman

Multiagent reinforcement learning (MARL) has benefited significantly from population-based and game-theoretic training regimes.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Game Theoretic Rating in N-player general-sum games with Equilibria

no code implementations • 5 Oct 2022 • Luke Marris, Marc Lanctot, Ian Gemp, Shayegan Omidshafiei, Stephen Mcaleer, Jerome Connor, Karl Tuyls, Thore Graepel

Rating strategies in a game is an important area of research in game theory and artificial intelligence, and can be applied to any real-world competitive or cooperative setting.

Paper
Add Code

Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments

no code implementations • 22 Sep 2022 • Ian Gemp, Thomas Anthony, Yoram Bachrach, Avishkar Bhoopchand, Kalesha Bullard, Jerome Connor, Vibhavari Dasagi, Bart De Vylder, Edgar Duenez-Guzman, Romuald Elie, Richard Everett, Daniel Hennes, Edward Hughes, Mina Khan, Marc Lanctot, Kate Larson, Guy Lever, SiQi Liu, Luke Marris, Kevin R. McKee, Paul Muller, Julien Perolat, Florian Strub, Andrea Tacchetti, Eugene Tarassov, Zhe Wang, Karl Tuyls

The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

1 code implementation • 30 Jun 2022 • Julien Perolat, Bart De Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen Mcaleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent SIfre, Nathalie Beauguerlange, Remi Munos, David Silver, Satinder Singh, Demis Hassabis, Karl Tuyls

It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes).

Board Games Decision Making +2

4,014

Paper
Code

A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games

3 code implementations • 12 Jun 2022 • Samuel Sokota, Ryan D'Orazio, J. Zico Kolter, Nicolas Loizou, Marc Lanctot, Ioannis Mitliagkas, Noam Brown, Christian Kroer

This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm.

MuJoCo Games reinforcement-learning +1

4,014

Paper
Code

ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret

1 code implementation • 8 Jun 2022 • Stephen Mcaleer, Gabriele Farina, Marc Lanctot, Tuomas Sandholm

DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains a neural network on an estimated regret target that can have extremely high variance due to an importance sampling term inherited from Monte Carlo CFR (MCCFR).

counterfactual

Paper
Code

Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games

no code implementations • 31 May 2022 • SiQi Liu, Marc Lanctot, Luke Marris, Nicolas Heess

Learning to play optimally against any mixture over a diverse set of strategies is of important practical interests in competitive games.

Paper
Add Code

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games: Corrections

1 code implementation • 24 May 2022 • Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, Amy R. Greenwald

Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria.

counterfactual Decision Making

Paper
Code

Anytime PSRO for Two-Player Zero-Sum Games

no code implementations • 19 Jan 2022 • Stephen Mcaleer, Kevin Wang, John Lanier, Marc Lanctot, Pierre Baldi, Tuomas Sandholm, Roy Fox

PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next.

Multi-agent Reinforcement Learning reinforcement-learning +2

Paper
Add Code

Student of Games: A unified learning algorithm for both perfect and imperfect information games

no code implementations • 6 Dec 2021 • Martin Schmid, Matej Moravcik, Neil Burch, Rudolf Kadlec, Josh Davidson, Kevin Waugh, Nolan Bard, Finbarr Timbers, Marc Lanctot, G. Zacharias Holland, Elnaz Davoodi, Alden Christianson, Michael Bowling

Games have a long history as benchmarks for progress in artificial intelligence.

Paper
Add Code

Dynamic population-based meta-learning for multi-agent communication with natural language

no code implementations • NeurIPS 2021 • Abhinav Gupta, Marc Lanctot, Angeliki Lazaridou

In this work, our goal is to train agents that can coordinate with seen, unseen as well as human partners in a multi-agent communication environment involving natural language.

Attribute Meta-Learning +1

Paper
Add Code

Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers

1 code implementation • 17 Jun 2021 • Luke Marris, Paul Muller, Marc Lanctot, Karl Tuyls, Thore Graepel

Two-player, constant-sum games are well studied in the literature, but there has been limited progress outside of this setting.

4,014

Paper
Code

Meta Learning for Multi-agent Communication

no code implementations • ICLR Workshop Learning_to_Learn 2021 • Abhinav Gupta, Angeliki Lazaridou, Marc Lanctot

Recent works have shown remarkable progress in training artificial agents to understand natural language but are focused on using large amounts of raw data involving huge compute requirements.

Meta-Learning Meta Reinforcement Learning

Paper
Add Code

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

1 code implementation • 13 Feb 2021 • Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, Amy Greenwald

counterfactual Decision Making

Paper
Code

Solving Common-Payoff Games with Approximate Policy Iteration

2 code implementations • 11 Jan 2021 • Samuel Sokota, Edward Lockhart, Finbarr Timbers, Elnaz Davoodi, Ryan D'Orazio, Neil Burch, Martin Schmid, Michael Bowling, Marc Lanctot

While this choice precludes CAPI from scaling to games as large as Hanabi, empirical results demonstrate that, on the games to which CAPI does scale, it is capable of discovering optimal joint policies even when other modern multi-agent reinforcement learning algorithms are unable to do so.

Decoder Multi-agent Reinforcement Learning +2

Paper
Code

Hindsight and Sequential Rationality of Correlated Play

1 code implementation • 10 Dec 2020 • Dustin Morrill, Ryan D'Orazio, Reca Sarfati, Marc Lanctot, James R. Wright, Amy Greenwald, Michael Bowling

This approach also leads to a game-theoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium.

counterfactual Decision Making +1

Paper
Code

Negotiating Team Formation Using Deep Reinforcement Learning

no code implementations • ICLR 2019 • Yoram Bachrach, Richard Everett, Edward Hughes, Angeliki Lazaridou, Joel Z. Leibo, Marc Lanctot, Michael Johanson, Wojciech M. Czarnecki, Thore Graepel

When autonomous agents interact in the same environment, they must often cooperate to achieve their goals.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

The Advantage Regret-Matching Actor-Critic

no code implementations • 27 Aug 2020 • Audrūnas Gruslys, Marc Lanctot, Rémi Munos, Finbarr Timbers, Martin Schmid, Julien Perolat, Dustin Morrill, Vinicius Zambaldi, Jean-Baptiste Lespiau, John Schultz, Mohammad Gheshlaghi Azar, Michael Bowling, Karl Tuyls

In this paper, we describe a general model-free RL method for no-regret learning based on repeated reconsideration of past behavior.

counterfactual Reinforcement Learning (RL)

Paper
Add Code

Learning to Play No-Press Diplomacy with Best Response Policy Iteration

2 code implementations • NeurIPS 2020 • Thomas Anthony, Tom Eccles, Andrea Tacchetti, János Kramár, Ian Gemp, Thomas C. Hudson, Nicolas Porcel, Marc Lanctot, Julien Pérolat, Richard Everett, Roman Werpachowski, Satinder Singh, Thore Graepel, Yoram Bachrach

It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms.

Reinforcement Learning (RL) Starcraft

Paper
Code

Approximate exploitability: Learning a best response in large games

no code implementations • 20 Apr 2020 • Finbarr Timbers, Nolan Bard, Edward Lockhart, Marc Lanctot, Martin Schmid, Neil Burch, Julian Schrittwieser, Thomas Hubert, Michael Bowling

In prior games research, agent evaluation often focused on the in-practice game outcomes.

Paper
Add Code

From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization

no code implementations • 19 Feb 2020 • Julien Perolat, Remi Munos, Jean-Baptiste Lespiau, Shayegan Omidshafiei, Mark Rowland, Pedro Ortega, Neil Burch, Thomas Anthony, David Balduzzi, Bart De Vylder, Georgios Piliouras, Marc Lanctot, Karl Tuyls

In this paper we investigate the Follow the Regularized Leader dynamics in sequential imperfect information games (IIG).

Paper
Add Code

A Generalized Training Approach for Multiagent Learning

1 code implementation • ICLR 2020 • Paul Muller, Shayegan Omidshafiei, Mark Rowland, Karl Tuyls, Julien Perolat, Si-Qi Liu, Daniel Hennes, Luke Marris, Marc Lanctot, Edward Hughes, Zhe Wang, Guy Lever, Nicolas Heess, Thore Graepel, Remi Munos

This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO).

4,014

Paper
Code

OpenSpiel: A Framework for Reinforcement Learning in Games

15 code implementations • 26 Aug 2019 • Marc Lanctot, Edward Lockhart, Jean-Baptiste Lespiau, Vinicius Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes, Ivo Danihelka, Jonah Ryan-Davis

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.

General Reinforcement Learning reinforcement-learning +1

4,014

Paper
Code

Neural Replicator Dynamics

1 code implementation • 1 Jun 2019 • Daniel Hennes, Dustin Morrill, Shayegan Omidshafiei, Remi Munos, Julien Perolat, Marc Lanctot, Audrunas Gruslys, Jean-Baptiste Lespiau, Paavo Parmas, Edgar Duenez-Guzman, Karl Tuyls

Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning.

counterfactual Policy Gradient Methods

Paper
Code

Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent

no code implementations • 13 Mar 2019 • Edward Lockhart, Marc Lanctot, Julien Pérolat, Jean-Baptiste Lespiau, Dustin Morrill, Finbarr Timbers, Karl Tuyls

In this paper, we present exploitability descent, a new algorithm to compute approximate equilibria in two-player zero-sum extensive-form games with imperfect information, by direct policy optimization against worst-case opponents.

counterfactual

Paper
Add Code

α-Rank: Multi-Agent Evaluation by Evolution

1 code implementation • 4 Mar 2019 • Shayegan Omidshafiei, Christos Papadimitriou, Georgios Piliouras, Karl Tuyls, Mark Rowland, Jean-Baptiste Lespiau, Wojciech M. Czarnecki, Marc Lanctot, Julien Perolat, Remi Munos

We introduce {\alpha}-Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs).

Mathematical Proofs

4,014

Paper
Code

Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research

no code implementations • 2 Mar 2019 • Joel Z. Leibo, Edward Hughes, Marc Lanctot, Thore Graepel

Evolution has produced a multi-scale mosaic of interacting adaptive units.

Paper
Add Code

The Hanabi Challenge: A New Frontier for AI Research

1 code implementation • 1 Feb 2019 • Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, Michael Bowling

From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making.

Decision Making Game of Hanabi

Paper
Code

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

1 code implementation • NeurIPS 2018 • Sriram Srinivasan, Marc Lanctot, Vinicius Zambaldi, Julien Perolat, Karl Tuyls, Remi Munos, Michael Bowling

Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence.

reinforcement-learning Reinforcement Learning (RL)

4,014

Paper
Code

Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines

no code implementations • 9 Sep 2018 • Martin Schmid, Neil Burch, Marc Lanctot, Matej Moravcik, Rudolf Kadlec, Michael Bowling

The new formulation allows estimates to be bootstrapped from other estimates within the same episode, propagating the benefits of baselines along the sampled trajectory; the estimates remain unbiased even when bootstrapping from other estimates.

counterfactual

Paper
Add Code

Emergent Communication through Negotiation

1 code implementation • ICLR 2018 • Kris Cao, Angeliki Lazaridou, Marc Lanctot, Joel Z. Leibo, Karl Tuyls, Stephen Clark

We also study communication behaviour in a setting where one agent interacts with agents in a community with different levels of prosociality and show how agent identifiability can aid negotiation.

Multi-agent Reinforcement Learning

Paper
Code

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

59 code implementations • 5 Dec 2017 • David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent SIfre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis

The game of chess is the most widely-studied domain in the history of artificial intelligence.

Ranked #1 on Game of Go on ELO Ratings

Game of Chess Game of Go +4

31,275

Paper
Code

A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning

1 code implementation • NeurIPS 2017 • Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Perolat, David Silver, Thore Graepel

To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL).

reinforcement-learning Reinforcement Learning (RL)

4,014

Paper
Code

Value-Decomposition Networks For Cooperative Multi-Agent Learning

8 code implementations • 16 Jun 2017 • Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel

We study the problem of cooperative multi-agent reinforcement learning with a single joint reward signal.

Ranked #1 on SMAC+ on Off_Superhard_parallel

Multi-agent Reinforcement Learning reinforcement-learning +2

168

Paper
Code

Deep Q-learning from Demonstrations

5 code implementations • 12 Apr 2017 • Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys

We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism.

Imitation Learning Q-Learning +1

2,597

Paper
Code

Multi-agent Reinforcement Learning in Sequential Social Dilemmas

4 code implementations • 10 Feb 2017 • Joel Z. Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, Thore Graepel

We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions.

Multi-agent Reinforcement Learning reinforcement-learning +1

375

Paper
Code

Memory-Efficient Backpropagation Through Time

2 code implementations • NeurIPS 2016 • Audrūnas Gruslys, Remi Munos, Ivo Danihelka, Marc Lanctot, Alex Graves

We propose a novel approach to reduce memory consumption of the backpropagation through time (BPTT) algorithm when training recurrent neural networks (RNNs).

2,622

Paper
Code

Convolution by Evolution: Differentiable Pattern Producing Networks

no code implementations • 8 Jun 2016 • Chrisantha Fernando, Dylan Banarse, Malcolm Reynolds, Frederic Besse, David Pfau, Max Jaderberg, Marc Lanctot, Daan Wierstra

In this work we introduce a differentiable version of the Compositional Pattern Producing Network, called the DPPN.

Denoising

Paper
Add Code

Dueling Network Architectures for Deep Reinforcement Learning

73 code implementations • 20 Nov 2015 • Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas

In recent years there have been many successes of using deep representations in reinforcement learning.

Ranked #1 on Atari Games on Atari 2600 Pong

Atari Games reinforcement-learning +1

48,719

Paper
Code

Monte Carlo Tree Search with Heuristic Evaluations using Implicit Minimax Backups

no code implementations • 2 Jun 2014 • Marc Lanctot, Mark H. M. Winands, Tom Pepels, Nathan R. Sturtevant

In recent years, combining ideas from traditional minimax search in MCTS has been shown to be advantageous in some domains, such as Lines of Action, Amazons, and Breakthrough.

Paper
Add Code

Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling

1 code implementation • 18 Jan 2014 • Marc Ponsen, Steven de Jong, Marc Lanctot

Our second contribution relates to the observation that a NE is not a best response against players that are not playing a NE.

Computer Science and Game Theory

Paper
Code

Convergence of Monte Carlo Tree Search in Simultaneous Move Games

no code implementations • NeurIPS 2013 • Viliam Lisy, Vojta Kovarik, Marc Lanctot, Branislav Bosansky

In this paper, we study Monte Carlo tree search (MCTS) in zero-sum extensive-form games with perfect information and simultaneous moves.

Paper
Add Code

Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions

no code implementations • NeurIPS 2012 • Neil Burch, Marc Lanctot, Duane Szafron, Richard G. Gibson

In this paper, we present a new MCCFR algorithm, Average Strategy Sampling (AS), that samples a subset of the player's actions according to the player's average strategy.

counterfactual

Paper
Add Code

Variance Reduction in Monte-Carlo Tree Search

no code implementations • NeurIPS 2011 • Joel Veness, Marc Lanctot, Michael Bowling

Monte-Carlo Tree Search (MCTS) has proven to be a powerful, generic planning technique for decision-making in single-agent and adversarial environments.

Decision Making

Paper
Add Code

Monte Carlo Sampling for Regret Minimization in Extensive Games

1 code implementation • NeurIPS 2009 • Marc Lanctot, Kevin Waugh, Martin Zinkevich, Michael Bowling

In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling.

counterfactual Decision Making

4,014

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.