Search Results for author: Michael Bowling

Found 57 papers, 22 papers with code

Real-Time Recurrent Learning using Trace Units in Reinforcement Learning

no code implementations2 Sep 2024 Esraa Elelimy, Adam White, Michael Bowling, Martha White

Recurrent Neural Networks (RNNs) are used to learn representations in partially observable environments.

reinforcement-learning

Meta-Gradient Search Control: A Method for Improving the Efficiency of Dyna-style Planning

no code implementations27 Jun 2024 Bradley Burega, John D. Martin, Luke Kapeluck, Michael Bowling

We study how a Reinforcement Learning (RL) system can remain sample-efficient when learning from an imperfect model of the environment.

Reinforcement Learning (RL)

Monitored Markov Decision Processes

1 code implementation9 Feb 2024 Simone Parisi, Montaser Mohammedalamen, Alireza Kazemipour, Matthew E. Taylor, Michael Bowling

In this paper, we formalize a novel but general RL framework - Monitored MDPs - where the agent cannot always observe rewards.

Reinforcement Learning (RL)

Proper Laplacian Representation Learning

2 code implementations16 Oct 2023 Diego Gomez, Michael Bowling, Marlos C. Machado

The ability to learn good representations of states is essential for solving large reinforcement learning problems, where exploration, generalization, and transfer are particularly challenging.

Representation Learning

Learning not to Regret

no code implementations2 Mar 2023 David Sychrovský, Michal Šustr, Elnaz Davoodi, Michael Bowling, Marc Lanctot, Martin Schmid

As these similar games feature similar equilibra, we investigate a way to accelerate equilibrium finding on such a distribution.

Targeted Search Control in AlphaZero for Effective Policy Improvement

1 code implementation23 Feb 2023 Alexandre Trudeau, Michael Bowling

AlphaZero trains upon self-play matches beginning from the initial state of a game and only samples actions over the first few moves, limiting its exploration of states deeper in the game tree.

Settling the Reward Hypothesis

no code implementations20 Dec 2022 Michael Bowling, John D. Martin, David Abel, Will Dabney

The reward hypothesis posits that, "all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)."

Over-communicate no more: Situated RL agents learn concise communication protocols

no code implementations2 Nov 2022 Aleksandra Kalinowska, Elnaz Davoodi, Florian Strub, Kory W Mathewson, Ivana Kajic, Michael Bowling, Todd D Murphey, Patrick M Pilarski

While it is known that communication facilitates cooperation in multi-agent settings, it is unclear how to design artificial agents that can learn to effectively and efficiently communicate with each other.

Reinforcement Learning (RL)

The Alberta Plan for AI Research

no code implementations23 Aug 2022 Richard S. Sutton, Michael Bowling, Patrick M. Pilarski

Herein we describe our approach to artificial intelligence research, which we call the Alberta Plan.

Interpolating Between Softmax Policy Gradient and Neural Replicator Dynamics with Capped Implicit Exploration

no code implementations4 Jun 2022 Dustin Morrill, Esra'a Saleh, Michael Bowling, Amy Greenwald

Neural replicator dynamics (NeuRD) is an alternative to the foundational softmax policy gradient (SPG) algorithm motivated by online learning and evolutionary game theory.

Decision Making

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games: Corrections

1 code implementation24 May 2022 Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, Amy R. Greenwald

Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria.

counterfactual Decision Making

The Partially Observable History Process

no code implementations15 Nov 2021 Dustin Morrill, Amy R. Greenwald, Michael Bowling

We introduce the partially observable history process (POHP) formalism for reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Learning to Be Cautious

no code implementations29 Oct 2021 Montaser Mohammedalamen, Dustin Morrill, Alexander Sieusahai, Yash Satsangi, Michael Bowling

An agent that could learn to be cautious would overcome this challenge by discovering for itself when and how to behave cautiously.

counterfactual Safe Reinforcement Learning +1

Temporal Abstraction in Reinforcement Learning with the Successor Representation

no code implementations12 Oct 2021 Marlos C. Machado, Andre Barreto, Doina Precup, Michael Bowling

In this paper, we argue that the successor representation (SR), which encodes states based on the pattern of state visitation that follows them, can be seen as a natural substrate for the discovery and use of temporal abstractions.

reinforcement-learning Reinforcement Learning (RL)

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

1 code implementation13 Feb 2021 Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, Amy Greenwald

Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria.

counterfactual Decision Making

Solving Common-Payoff Games with Approximate Policy Iteration

2 code implementations11 Jan 2021 Samuel Sokota, Edward Lockhart, Finbarr Timbers, Elnaz Davoodi, Ryan D'Orazio, Neil Burch, Martin Schmid, Michael Bowling, Marc Lanctot

While this choice precludes CAPI from scaling to games as large as Hanabi, empirical results demonstrate that, on the games to which CAPI does scale, it is capable of discovering optimal joint policies even when other modern multi-agent reinforcement learning algorithms are unable to do so.

Decoder Multi-agent Reinforcement Learning +2

Hindsight and Sequential Rationality of Correlated Play

1 code implementation10 Dec 2020 Dustin Morrill, Ryan D'Orazio, Reca Sarfati, Marc Lanctot, James R. Wright, Amy Greenwald, Michael Bowling

This approach also leads to a game-theoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium.

counterfactual Decision Making +1

Useful Policy Invariant Shaping from Arbitrary Advice

no code implementations2 Nov 2020 Paniz Behboudian, Yash Satsangi, Matthew E. Taylor, Anna Harutyunyan, Michael Bowling

Furthermore, if the reward is constructed from a potential function, the optimal policy is guaranteed to be unaltered.

Marginal Utility for Planning in Continuous or Large Discrete Action Spaces

no code implementations NeurIPS 2020 Zaheen Farraz Ahmad, Levi H. S. Lelis, Michael Bowling

Generating good candidate actions is critical to the success of sample-based planners, particularly in continuous or large action spaces.

Action Generation

Sample-Efficient Model-based Actor-Critic for an Interactive Dialogue Task

no code implementations28 Apr 2020 Katya Kudashkina, Valliappa Chockalingam, Graham W. Taylor, Michael Bowling

Human-computer interactive systems that rely on machine learning are becoming paramount to the lives of millions of people who use digital assistants on a daily basis.

Model-based Reinforcement Learning

Alternative Function Approximation Parameterizations for Solving Games: An Analysis of $f$-Regression Counterfactual Regret Minimization

no code implementations6 Dec 2019 Ryan D'Orazio, Dustin Morrill, James R. Wright, Michael Bowling

In contrast, the more conventional softmax parameterization is standard in the field of reinforcement learning and yields a regret bound with a better dependence on the number of actions.

counterfactual regression +2

Rethinking Formal Models of Partially Observable Multiagent Decision Making

no code implementations26 Jun 2019 Vojtěch Kovařík, Martin Schmid, Neil Burch, Michael Bowling, Viliam Lisý

A second issue is that while EFGs have recently seen significant algorithmic progress, their classical formalization is unsuitable for efficient presentation of the underlying ideas, such as those around decomposition.

counterfactual Decision Making +1

Ease-of-Teaching and Language Structure from Emergent Communication

1 code implementation NeurIPS 2019 Fushan Li, Michael Bowling

Artificial agents have been shown to learn to communicate when needed to complete a cooperative task.

Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning

1 code implementation4 Nov 2018 Jakob N. Foerster, Francis Song, Edward Hughes, Neil Burch, Iain Dunning, Shimon Whiteson, Matthew Botvinick, Michael Bowling

We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment.

Decoder Multi-agent Reinforcement Learning +3

Generalization and Regularization in DQN

1 code implementation29 Sep 2018 Jesse Farebrother, Marlos C. Machado, Michael Bowling

Deep reinforcement learning algorithms have shown an impressive ability to learn complex control policies in high-dimensional tasks.

Atari Games Benchmarking +2

Solving Large Extensive-Form Games with Strategy Constraints

no code implementations20 Sep 2018 Trevor Davis, Kevin Waugh, Michael Bowling

Extensive-form games are a common model for multiagent interactions with imperfect information.

counterfactual

Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines

no code implementations9 Sep 2018 Martin Schmid, Neil Burch, Marc Lanctot, Matej Moravcik, Rudolf Kadlec, Michael Bowling

The new formulation allows estimates to be bootstrapped from other estimates within the same episode, propagating the benefits of baselines along the sampled trajectory; the estimates remain unbiased even when bootstrapping from other estimates.

counterfactual

Count-Based Exploration with the Successor Representation

2 code implementations ICLR 2019 Marlos C. Machado, Marc G. Bellemare, Michael Bowling

In this paper we introduce a simple approach for exploration in reinforcement learning (RL) that allows us to develop theoretically justified algorithms in the tabular case but that is also extendable to settings where function approximation is required.

Atari Games Efficient Exploration +1

The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces

no code implementations5 Jun 2018 G. Zacharias Holland, Erin J. Talvitie, Michael Bowling

Dyna is a fundamental approach to model-based reinforcement learning (MBRL) that interleaves planning, acting, and learning in an online setting.

Atari Games Model-based Reinforcement Learning

Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents

7 code implementations18 Sep 2017 Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, Michael Bowling

The Arcade Learning Environment (ALE) is an evaluation platform that poses the challenge of building AI agents with general competency across dozens of Atari 2600 games.

Atari Games

DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker

1 code implementation6 Jan 2017 Matej Moravčík, Martin Schmid, Neil Burch, Viliam Lisý, Dustin Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson, Michael Bowling

Poker is the quintessential game of imperfect information, and a longstanding challenge problem in artificial intelligence.

Game of Poker

Equilibrium Approximation Quality of Current No-Limit Poker Bots

no code implementations22 Dec 2016 Viliam Lisy, Michael Bowling

Approximating a Nash equilibrium is currently the best performing approach for creating poker-playing programs.

Computer Science and Game Theory

AIVAT: A New Variance Reduction Technique for Agent Evaluation in Imperfect Information Games

no code implementations20 Dec 2016 Neil Burch, Martin Schmid, Matej Moravčík, Michael Bowling

Evaluating agent performance when outcomes are stochastic and agents use randomized strategies can be challenging when there is limited data available.

The Forget-me-not Process

no code implementations NeurIPS 2016 Kieran Milan, Joel Veness, James Kirkpatrick, Michael Bowling, Anna Koop, Demis Hassabis

We introduce the Forget-me-not Process, an efficient, non-parametric meta-algorithm for online probabilistic sequence prediction for piecewise stationary, repeating sources.

Learning Purposeful Behaviour in the Absence of Rewards

no code implementations25 May 2016 Marlos C. Machado, Michael Bowling

In the reinforcement learning framework, goals are encoded as reward functions that guide agent behaviour, and the sum of observed rewards provide a notion of progress.

State of the Art Control of Atari Games Using Shallow Reinforcement Learning

1 code implementation4 Dec 2015 Yitao Liang, Marlos C. Machado, Erik Talvitie, Michael Bowling

The recently introduced Deep Q-Networks (DQN) algorithm has gained attention as one of the first successful combinations of deep neural networks and reinforcement learning.

Atari Games reinforcement-learning +1

Solving Games with Functional Regret Estimation

no code implementations28 Nov 2014 Kevin Waugh, Dustin Morrill, J. Andrew Bagnell, Michael Bowling

We propose a novel online learning method for minimizing regret in large extensive-form games.

Domain-Independent Optimistic Initialization for Reinforcement Learning

no code implementations16 Oct 2014 Marlos C. Machado, Sriram Srinivasan, Michael Bowling

In Reinforcement Learning (RL), it is common to use optimistic initialization of value functions to encourage exploration.

reinforcement-learning Reinforcement Learning (RL)

Tractable Objectives for Robust Policy Optimization

no code implementations NeurIPS 2012 Katherine Chen, Michael Bowling

Robust policy optimization acknowledges that risk-aversion plays a vital role in real-world decision-making.

Decision Making

Sketch-Based Linear Value Function Approximation

no code implementations NeurIPS 2012 Marc Bellemare, Joel Veness, Michael Bowling

Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility of collisions.

Atari Games reinforcement-learning +1

The Arcade Learning Environment: An Evaluation Platform for General Agents

3 code implementations19 Jul 2012 Marc G. Bellemare, Yavar Naddaf, Joel Veness, Michael Bowling

We illustrate the promise of ALE by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning.

Atari Games Benchmarking +4

Variance Reduction in Monte-Carlo Tree Search

no code implementations NeurIPS 2011 Joel Veness, Marc Lanctot, Michael Bowling

Monte-Carlo Tree Search (MCTS) has proven to be a powerful, generic planning technique for decision-making in single-agent and adversarial environments.

Decision Making

Context Tree Switching

1 code implementation14 Nov 2011 Joel Veness, Kee Siong Ng, Marcus Hutter, Michael Bowling

This paper describes the Context Tree Switching technique, a modification of Context Tree Weighting for the prediction of binary, stationary, n-Markov sources.

Information Theory Information Theory

Monte Carlo Sampling for Regret Minimization in Extensive Games

1 code implementation NeurIPS 2009 Marc Lanctot, Kevin Waugh, Martin Zinkevich, Michael Bowling

In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling.

counterfactual Decision Making

Strategy Grafting in Extensive Games

no code implementations NeurIPS 2009 Kevin Waugh, Nolan Bard, Michael Bowling

A common approach for computing strategies in these large games is to first employ an abstraction technique to reduce the original game to an abstract game that is of a manageable size.

Computing Robust Counter-Strategies

1 code implementation NeurIPS 2007 Michael Johanson, Martin Zinkevich, Michael Bowling

Adaptation to other initially unknown agents often requires computing an effective counter-strategy.

Cannot find the paper you are looking for? You can Submit a new open access paper.