1 code implementation • 2 Mar 2023 • Marc Lanctot, John Schultz, Neil Burch, Max Olan Smith, Daniel Hennes, Thomas Anthony, Julien Perolat
Progress in fields of machine learning and adversarial planning has benefited significantly from benchmark domains, from checkers and the classic UCI data sets to Go and Diplomacy.
1 code implementation • 30 Jun 2022 • Julien Perolat, Bart De Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen Mcaleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent SIfre, Nathalie Beauguerlange, Remi Munos, David Silver, Satinder Singh, Demis Hassabis, Karl Tuyls
It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes).
no code implementations • 6 Dec 2021 • Martin Schmid, Matej Moravcik, Neil Burch, Rudolf Kadlec, Josh Davidson, Kevin Waugh, Nolan Bard, Finbarr Timbers, Marc Lanctot, G. Zacharias Holland, Elnaz Davoodi, Alden Christianson, Michael Bowling
Games have a long history as benchmarks for progress in artificial intelligence.
2 code implementations • 11 Jan 2021 • Samuel Sokota, Edward Lockhart, Finbarr Timbers, Elnaz Davoodi, Ryan D'Orazio, Neil Burch, Martin Schmid, Michael Bowling, Marc Lanctot
While this choice precludes CAPI from scaling to games as large as Hanabi, empirical results demonstrate that, on the games to which CAPI does scale, it is capable of discovering optimal joint policies even when other modern multi-agent reinforcement learning algorithms are unable to do so.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
no code implementations • 28 Nov 2020 • Edward Lockhart, Neil Burch, Nolan Bard, Sebastian Borgeaud, Tom Eccles, Lucas Smaira, Ray Smith
We introduce a human-compatible reinforcement-learning approach to a cooperative game, making use of a third-party hand-coded human-compatible bot to generate initial training data and to perform initial evaluation.
no code implementations • 20 Apr 2020 • Finbarr Timbers, Nolan Bard, Edward Lockhart, Marc Lanctot, Martin Schmid, Neil Burch, Julian Schrittwieser, Thomas Hubert, Michael Bowling
In prior games research, agent evaluation often focused on the in-practice game outcomes.
no code implementations • 19 Feb 2020 • Julien Perolat, Remi Munos, Jean-Baptiste Lespiau, Shayegan Omidshafiei, Mark Rowland, Pedro Ortega, Neil Burch, Thomas Anthony, David Balduzzi, Bart De Vylder, Georgios Piliouras, Marc Lanctot, Karl Tuyls
In this paper we investigate the Follow the Regularized Leader dynamics in sequential imperfect information games (IIG).
no code implementations • 26 Jun 2019 • Vojtěch Kovařík, Martin Schmid, Neil Burch, Michael Bowling, Viliam Lisý
A second issue is that while EFGs have recently seen significant algorithmic progress, their classical formalization is unsuitable for efficient presentation of the underlying ideas, such as those around decomposition.
2 code implementations • 1 Feb 2019 • Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, Michael Bowling
From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making.
1 code implementation • 4 Nov 2018 • Jakob N. Foerster, Francis Song, Edward Hughes, Neil Burch, Iain Dunning, Shimon Whiteson, Matthew Botvinick, Michael Bowling
We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment.
Multi-agent Reinforcement Learning
Policy Gradient Methods
+2
no code implementations • 9 Sep 2018 • Martin Schmid, Neil Burch, Marc Lanctot, Matej Moravcik, Rudolf Kadlec, Michael Bowling
The new formulation allows estimates to be bootstrapped from other estimates within the same episode, propagating the benefits of baselines along the sampled trajectory; the estimates remain unbiased even when bootstrapping from other estimates.
1 code implementation • 6 Jan 2017 • Matej Moravčík, Martin Schmid, Neil Burch, Viliam Lisý, Dustin Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson, Michael Bowling
Poker is the quintessential game of imperfect information, and a longstanding challenge problem in artificial intelligence.
no code implementations • 20 Dec 2016 • Neil Burch, Martin Schmid, Matej Moravčík, Michael Bowling
Evaluating agent performance when outcomes are stochastic and agents use randomized strategies can be challenging when there is limited data available.
no code implementations • 15 Jan 2014 • Uzi Zahavi, Ariel Felner, Neil Burch, Robert C. Holte
In this paper we show that, in addition to requiring the heuristic to be consistent, their formulas predictions are accurate only at levels of the brute-force search tree where the heuristic values obey the unconditional distribution that they defined and then used in their formula.
no code implementations • NeurIPS 2012 • Neil Burch, Marc Lanctot, Duane Szafron, Richard G. Gibson
In this paper, we present a new MCCFR algorithm, Average Strategy Sampling (AS), that samples a subset of the player's actions according to the player's average strategy.
1 code implementation • 4 Jul 2012 • Finnegan Southey, Michael P. Bowling, Bryce Larson, Carmelo Piccione, Neil Burch, Darse Billings, Chris Rayner
We demonstrate methods for playing effective responses to the opponent, based on the posterior.