no code implementations • 2 Sep 2024 • Esraa Elelimy, Adam White, Michael Bowling, Martha White
Recurrent Neural Networks (RNNs) are used to learn representations in partially observable environments.
no code implementations • 27 Jun 2024 • Bradley Burega, John D. Martin, Luke Kapeluck, Michael Bowling
We study how a Reinforcement Learning (RL) system can remain sample-efficient when learning from an imperfect model of the environment.
1 code implementation • 20 Jun 2024 • Simone Parisi, Alireza Kazemipour, Michael Bowling
Exploration in reinforcement learning (RL) remains an open challenge.
1 code implementation • 9 Feb 2024 • Simone Parisi, Montaser Mohammedalamen, Alireza Kazemipour, Matthew E. Taylor, Michael Bowling
In this paper, we formalize a novel but general RL framework - Monitored MDPs - where the agent cannot always observe rewards.
no code implementations • 12 Nov 2023 • Zahra Bashir, Michael Bowling, Levi H. S. Lelis
The LLM then formulates a natural language explanation of the program.
no code implementations • 16 Oct 2023 • Zhe Wang, Petar Veličković, Daniel Hennes, Nenad Tomašev, Laurel Prince, Michael Kaisers, Yoram Bachrach, Romuald Elie, Li Kevin Wenliang, Federico Piccinini, William Spearman, Ian Graham, Jerome Connor, Yi Yang, Adrià Recasens, Mina Khan, Nathalie Beauguerlange, Pablo Sprechmann, Pol Moreno, Nicolas Heess, Michael Bowling, Demis Hassabis, Karl Tuyls
The utility of TacticAI is validated by a qualitative study conducted with football domain experts at Liverpool FC.
2 code implementations • 16 Oct 2023 • Diego Gomez, Michael Bowling, Marlos C. Machado
The ability to learn good representations of states is essential for solving large reinforcement learning problems, where exploration, generalization, and transfer are particularly challenging.
no code implementations • 2 Mar 2023 • David Sychrovský, Michal Šustr, Elnaz Davoodi, Michael Bowling, Marc Lanctot, Martin Schmid
As these similar games feature similar equilibra, we investigate a way to accelerate equilibrium finding on such a distribution.
1 code implementation • 23 Feb 2023 • Alexandre Trudeau, Michael Bowling
AlphaZero trains upon self-play matches beginning from the initial state of a game and only samples actions over the first few moves, limiting its exploration of states deeper in the game tree.
no code implementations • 20 Dec 2022 • Michael Bowling, John D. Martin, David Abel, Will Dabney
The reward hypothesis posits that, "all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)."
no code implementations • 2 Nov 2022 • Aleksandra Kalinowska, Elnaz Davoodi, Florian Strub, Kory W Mathewson, Ivana Kajic, Michael Bowling, Todd D Murphey, Patrick M Pilarski
While it is known that communication facilitates cooperation in multi-agent settings, it is unclear how to design artificial agents that can learn to effectively and efficiently communicate with each other.
no code implementations • 23 Aug 2022 • Richard S. Sutton, Michael Bowling, Patrick M. Pilarski
Herein we describe our approach to artificial intelligence research, which we call the Alberta Plan.
no code implementations • 4 Jun 2022 • Dustin Morrill, Esra'a Saleh, Michael Bowling, Amy Greenwald
Neural replicator dynamics (NeuRD) is an alternative to the foundational softmax policy gradient (SPG) algorithm motivated by online learning and evolutionary game theory.
1 code implementation • 24 May 2022 • Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, Amy R. Greenwald
Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria.
no code implementations • 22 May 2022 • Esra'a Saleh, John D. Martin, Anna Koop, Arash Pourzarabi, Michael Bowling
We focus our investigations on Dyna-style planning in a prediction setting.
no code implementations • 6 Dec 2021 • Martin Schmid, Matej Moravcik, Neil Burch, Rudolf Kadlec, Josh Davidson, Kevin Waugh, Nolan Bard, Finbarr Timbers, Marc Lanctot, G. Zacharias Holland, Elnaz Davoodi, Alden Christianson, Michael Bowling
Games have a long history as benchmarks for progress in artificial intelligence.
no code implementations • 15 Nov 2021 • Dustin Morrill, Amy R. Greenwald, Michael Bowling
We introduce the partially observable history process (POHP) formalism for reinforcement learning.
no code implementations • 29 Oct 2021 • Montaser Mohammedalamen, Dustin Morrill, Alexander Sieusahai, Yash Satsangi, Michael Bowling
An agent that could learn to be cautious would overcome this challenge by discovering for itself when and how to behave cautiously.
no code implementations • 12 Oct 2021 • Marlos C. Machado, Andre Barreto, Doina Precup, Michael Bowling
In this paper, we argue that the successor representation (SR), which encodes states based on the pattern of state visitation that follows them, can be seen as a natural substrate for the discovery and use of temporal abstractions.
1 code implementation • 13 Feb 2021 • Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, Amy Greenwald
Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria.
2 code implementations • 11 Jan 2021 • Samuel Sokota, Edward Lockhart, Finbarr Timbers, Elnaz Davoodi, Ryan D'Orazio, Neil Burch, Martin Schmid, Michael Bowling, Marc Lanctot
While this choice precludes CAPI from scaling to games as large as Hanabi, empirical results demonstrate that, on the games to which CAPI does scale, it is capable of discovering optimal joint policies even when other modern multi-agent reinforcement learning algorithms are unable to do so.
1 code implementation • 10 Dec 2020 • Dustin Morrill, Ryan D'Orazio, Reca Sarfati, Marc Lanctot, James R. Wright, Amy Greenwald, Michael Bowling
This approach also leads to a game-theoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium.
no code implementations • 2 Nov 2020 • Paniz Behboudian, Yash Satsangi, Matthew E. Taylor, Anna Harutyunyan, Michael Bowling
Furthermore, if the reward is constructed from a potential function, the optimal policy is guaranteed to be unaltered.
no code implementations • 27 Aug 2020 • Audrūnas Gruslys, Marc Lanctot, Rémi Munos, Finbarr Timbers, Martin Schmid, Julien Perolat, Dustin Morrill, Vinicius Zambaldi, Jean-Baptiste Lespiau, John Schultz, Mohammad Gheshlaghi Azar, Michael Bowling, Karl Tuyls
In this paper, we describe a general model-free RL method for no-regret learning based on repeated reconsideration of past behavior.
no code implementations • NeurIPS 2020 • Zaheen Farraz Ahmad, Levi H. S. Lelis, Michael Bowling
Generating good candidate actions is critical to the success of sample-based planners, particularly in continuous or large action spaces.
no code implementations • 28 Apr 2020 • Katya Kudashkina, Valliappa Chockalingam, Graham W. Taylor, Michael Bowling
Human-computer interactive systems that rely on machine learning are becoming paramount to the lives of millions of people who use digital assistants on a daily basis.
no code implementations • 20 Apr 2020 • Finbarr Timbers, Nolan Bard, Edward Lockhart, Marc Lanctot, Martin Schmid, Neil Burch, Julian Schrittwieser, Thomas Hubert, Michael Bowling
In prior games research, agent evaluation often focused on the in-practice game outcomes.
no code implementations • 6 Dec 2019 • Ryan D'Orazio, Dustin Morrill, James R. Wright, Michael Bowling
In contrast, the more conventional softmax parameterization is standard in the field of reinforcement learning and yields a regret bound with a better dependence on the number of actions.
no code implementations • ICML 2020 • Trevor Davis, Martin Schmid, Michael Bowling
In this paper, we extend recent work that uses baseline estimates to reduce this variance.
no code implementations • 26 Jun 2019 • Vojtěch Kovařík, Martin Schmid, Neil Burch, Michael Bowling, Viliam Lisý
A second issue is that while EFGs have recently seen significant algorithmic progress, their classical formalization is unsuitable for efficient presentation of the underlying ideas, such as those around decomposition.
1 code implementation • NeurIPS 2019 • Fushan Li, Michael Bowling
Artificial agents have been shown to learn to communicate when needed to complete a cooperative task.
1 code implementation • 1 Feb 2019 • Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, Michael Bowling
From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making.
1 code implementation • 4 Nov 2018 • Jakob N. Foerster, Francis Song, Edward Hughes, Neil Burch, Iain Dunning, Shimon Whiteson, Matthew Botvinick, Michael Bowling
We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment.
1 code implementation • NeurIPS 2018 • Sriram Srinivasan, Marc Lanctot, Vinicius Zambaldi, Julien Perolat, Karl Tuyls, Remi Munos, Michael Bowling
Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence.
1 code implementation • 29 Sep 2018 • Jesse Farebrother, Marlos C. Machado, Michael Bowling
Deep reinforcement learning algorithms have shown an impressive ability to learn complex control policies in high-dimensional tasks.
no code implementations • 20 Sep 2018 • Trevor Davis, Kevin Waugh, Michael Bowling
Extensive-form games are a common model for multiagent interactions with imperfect information.
no code implementations • 9 Sep 2018 • Martin Schmid, Neil Burch, Marc Lanctot, Matej Moravcik, Rudolf Kadlec, Michael Bowling
The new formulation allows estimates to be bootstrapped from other estimates within the same episode, propagating the benefits of baselines along the sampled trajectory; the estimates remain unbiased even when bootstrapping from other estimates.
2 code implementations • ICLR 2019 • Marlos C. Machado, Marc G. Bellemare, Michael Bowling
In this paper we introduce a simple approach for exploration in reinforcement learning (RL) that allows us to develop theoretically justified algorithms in the tabular case but that is also extendable to settings where function approximation is required.
Ranked #16 on Atari Games on Atari 2600 Venture
no code implementations • 5 Jun 2018 • G. Zacharias Holland, Erin J. Talvitie, Michael Bowling
Dyna is a fundamental approach to model-based reinforcement learning (MBRL) that interleaves planning, acting, and learning in an online setting.
7 code implementations • 18 Sep 2017 • Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, Michael Bowling
The Arcade Learning Environment (ALE) is an evaluation platform that poses the challenge of building AI agents with general competency across dozens of Atari 2600 games.
1 code implementation • ICML 2017 • Marlos C. Machado, Marc G. Bellemare, Michael Bowling
Representation learning and option discovery are two of the biggest challenges in reinforcement learning (RL).
1 code implementation • 6 Jan 2017 • Matej Moravčík, Martin Schmid, Neil Burch, Viliam Lisý, Dustin Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson, Michael Bowling
Poker is the quintessential game of imperfect information, and a longstanding challenge problem in artificial intelligence.
no code implementations • 22 Dec 2016 • Viliam Lisy, Michael Bowling
Approximating a Nash equilibrium is currently the best performing approach for creating poker-playing programs.
Computer Science and Game Theory
no code implementations • 20 Dec 2016 • Neil Burch, Martin Schmid, Matej Moravčík, Michael Bowling
Evaluating agent performance when outcomes are stochastic and agents use randomized strategies can be challenging when there is limited data available.
no code implementations • NeurIPS 2016 • Kieran Milan, Joel Veness, James Kirkpatrick, Michael Bowling, Anna Koop, Demis Hassabis
We introduce the Forget-me-not Process, an efficient, non-parametric meta-algorithm for online probabilistic sequence prediction for piecewise stationary, repeating sources.
no code implementations • 25 May 2016 • Marlos C. Machado, Michael Bowling
In the reinforcement learning framework, goals are encoded as reward functions that guide agent behaviour, and the sum of observed rewards provide a notion of progress.
1 code implementation • 4 Dec 2015 • Yitao Liang, Marlos C. Machado, Erik Talvitie, Michael Bowling
The recently introduced Deep Q-Networks (DQN) algorithm has gained attention as one of the first successful combinations of deep neural networks and reinforcement learning.
no code implementations • 28 Nov 2014 • Kevin Waugh, Dustin Morrill, J. Andrew Bagnell, Michael Bowling
We propose a novel online learning method for minimizing regret in large extensive-form games.
no code implementations • 16 Oct 2014 • Marlos C. Machado, Sriram Srinivasan, Michael Bowling
In Reinforcement Learning (RL), it is common to use optimistic initialization of value functions to encourage exploration.
no code implementations • NeurIPS 2012 • Katherine Chen, Michael Bowling
Robust policy optimization acknowledges that risk-aversion plays a vital role in real-world decision-making.
no code implementations • NeurIPS 2012 • Marc Bellemare, Joel Veness, Michael Bowling
Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility of collisions.
3 code implementations • 19 Jul 2012 • Marc G. Bellemare, Yavar Naddaf, Joel Veness, Michael Bowling
We illustrate the promise of ALE by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning.
Ranked #1 on Atari Games on Atari 2600 Pooyan
no code implementations • NeurIPS 2011 • Joel Veness, Marc Lanctot, Michael Bowling
Monte-Carlo Tree Search (MCTS) has proven to be a powerful, generic planning technique for decision-making in single-agent and adversarial environments.
1 code implementation • 14 Nov 2011 • Joel Veness, Kee Siong Ng, Marcus Hutter, Michael Bowling
This paper describes the Context Tree Switching technique, a modification of Context Tree Weighting for the prediction of binary, stationary, n-Markov sources.
Information Theory Information Theory
1 code implementation • NeurIPS 2009 • Marc Lanctot, Kevin Waugh, Martin Zinkevich, Michael Bowling
In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling.
no code implementations • NeurIPS 2009 • Kevin Waugh, Nolan Bard, Michael Bowling
A common approach for computing strategies in these large games is to first employ an abstraction technique to reduce the original game to an abstract game that is of a manageable size.
1 code implementation • NeurIPS 2007 • Michael Johanson, Martin Zinkevich, Michael Bowling
Adaptation to other initially unknown agents often requires computing an effective counter-strategy.