Search Results for author: Stephen Mcaleer

Found 25 papers, 9 papers with code

Game Theoretic Rating in N-player general-sum games with Equilibria

no code implementations5 Oct 2022 Luke Marris, Marc Lanctot, Ian Gemp, Shayegan Omidshafiei, Stephen Mcaleer, Jerome Connor, Karl Tuyls, Thore Graepel

Rating strategies in a game is an important area of research in game theory and artificial intelligence, and can be applied to any real-world competitive or cooperative setting.

Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks

1 code implementation16 Sep 2022 Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox

On a set of 26 benchmark Atari environments, MeanQ outperforms all tested baselines, including the best available baseline, SUNRISE, at 100K interaction steps in 16/26 environments, and by 68% on average.

Feasible Adversarial Robust Reinforcement Learning for Underspecified Environments

no code implementations19 Jul 2022 JB Lanier, Stephen Mcaleer, Pierre Baldi, Roy Fox

In this paper, we propose Feasible Adversarial Robust RL (FARR), a novel problem formulation and objective for automatically determining the set of environment parameter values over which to be robust.

reinforcement-learning

Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

no code implementations13 Jul 2022 Stephen Mcaleer, JB Lanier, Kevin Wang, Pierre Baldi, Roy Fox, Tuomas Sandholm

Instead of adding only deterministic best responses to the opponent's least exploitable population mixture, SP-PSRO also learns an approximately optimal stochastic policy and adds it to the population as well.

ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret

1 code implementation8 Jun 2022 Stephen Mcaleer, Gabriele Farina, Marc Lanctot, Tuomas Sandholm

We show that the variance of the estimated regret of a tabular version of ESCHER with an oracle value function is significantly lower than that of outcome sampling MCCFR and tabular DREAM with an oracle value function.

Learning Risk-Averse Equilibria in Multi-Agent Systems

no code implementations30 May 2022 Oliver Slumbers, David Henry Mguni, Stephen Mcaleer, Jun Wang, Yaodong Yang

In multi-agent systems, intelligent agents are tasked with making decisions that have optimal outcomes when the actions of the other agents are as expected, whilst also being prepared for unexpected behaviour.

Anytime PSRO for Two-Player Zero-Sum Games

no code implementations19 Jan 2022 Stephen Mcaleer, Kevin Wang, John Lanier, Marc Lanctot, Pierre Baldi, Tuomas Sandholm, Roy Fox

PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next.

Multi-agent Reinforcement Learning reinforcement-learning

Target Entropy Annealing for Discrete Soft Actor-Critic

no code implementations6 Dec 2021 Yaosheng Xu, Dailin Hu, Litian Liang, Stephen Mcaleer, Pieter Abbeel, Roy Fox

Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings.

Atari Games

Neural Auto-Curricula in Two-Player Zero-Sum Games

1 code implementation NeurIPS 2021 Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen Mcaleer, Ying Wen, Jun Wang, Yaodong Yang

When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population.

Multi-agent Reinforcement Learning

Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates

no code implementations28 Oct 2021 Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox

Under the belief that $\beta$ is closely related to the (state dependent) model uncertainty, Entropy Regularized Q-Learning (EQL) further introduces a principled scheduling of $\beta$ by maintaining a collection of the model parameters that characterizes model uncertainty.

Q-Learning

Independent Natural Policy Gradient Always Converges in Markov Potential Games

no code implementations20 Oct 2021 Roy Fox, Stephen Mcaleer, Will Overman, Ioannis Panageas

Recent results have shown that independent policy gradient converges in MPGs but it was not known whether Independent Natural Policy Gradient converges in MPGs as well.

Multi-agent Reinforcement Learning

Improving Social Welfare While Preserving Autonomy via a Pareto Mediator

no code implementations7 Jun 2021 Stephen Mcaleer, John Lanier, Michael Dennis, Pierre Baldi, Roy Fox

Machine learning algorithms often make decisions on behalf of agents with varied and sometimes conflicting interests.

Neural Auto-Curricula

1 code implementation4 Jun 2021 Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen Mcaleer, Ying Wen, Jun Wang, Yaodong Yang

When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population.

Multi-agent Reinforcement Learning

Online Double Oracle

1 code implementation13 Mar 2021 Le Cong Dinh, Yaodong Yang, Stephen Mcaleer, Zheng Tian, Nicolas Perez Nieves, Oliver Slumbers, David Henry Mguni, Haitham Bou Ammar, Jun Wang

Solving strategic games with huge action space is a critical yet under-explored topic in economics, operations research and artificial intelligence.

online learning

XDO: A Double Oracle Algorithm for Extensive-Form Games

1 code implementation NeurIPS 2021 Stephen Mcaleer, John Lanier, Kevin Wang, Pierre Baldi, Roy Fox

NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games.

A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks

no code implementations8 Feb 2021 Forest Agostinelli, Alexander Shmakov, Stephen Mcaleer, Roy Fox, Pierre Baldi

Since the computation required to expand a node and compute the heuristic values for all of its generated children grows linearly with the size of the action space, A* search can become impractical for problems with large action spaces.

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

2 code implementations NeurIPS 2020 Stephen McAleer, John Lanier, Roy Fox, Pierre Baldi

We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of $10^{50}$.

reinforcement-learning

ColosseumRL: A Framework for Multiagent Reinforcement Learning in $N$-Player Games

no code implementations10 Dec 2019 Alexander Shmakov, John Lanier, Stephen Mcaleer, Rohan Achar, Cristina Lopes, Pierre Baldi

Much of recent success in multiagent reinforcement learning has been in two-player zero-sum games.

Multiagent Systems

Curiosity-Driven Multi-Criteria Hindsight Experience Replay

1 code implementation9 Jun 2019 John B. Lanier, Stephen Mcaleer, Pierre Baldi

Dealing with sparse rewards is a longstanding challenge in reinforcement learning.

reinforcement-learning

Solving the Rubik's Cube with Approximate Policy Iteration

no code implementations ICLR 2019 Stephen McAleer, Forest Agostinelli, Alexander Shmakov, Pierre Baldi

Autodidactic Iteration is able to learn how to solve the Rubik’s Cube and the 15-puzzle without relying on human data.

Solving the Rubik's Cube Without Human Knowledge

9 code implementations18 May 2018 Stephen McAleer, Forest Agostinelli, Alexander Shmakov, Pierre Baldi

A generally intelligent agent must be able to teach itself how to solve problems in complex domains with minimal human supervision.

Combinatorial Optimization reinforcement-learning

Cannot find the paper you are looking for? You can Submit a new open access paper.