Search Results for author: Noam Brown

Found 26 papers, 11 papers with code

The Update-Equivalence Framework for Decision-Time Planning

no code implementations25 Apr 2023 Samuel Sokota, Gabriele Farina, David J. Wu, Hengyuan Hu, Kevin A. Wang, J. Zico Kolter, Noam Brown

Using this framework, we derive a provably sound search algorithm for fully cooperative games based on mirror descent and a search algorithm for adversarial games based on magnetic mirror descent.

Abstracting Imperfect Information Away from Two-Player Zero-Sum Games

no code implementations22 Jan 2023 Samuel Sokota, Ryan D'Orazio, Chun Kai Ling, David J. Wu, J. Zico Kolter, Noam Brown

Because these regularized equilibria can be made arbitrarily close to Nash equilibria, our result opens the door to a new perspective to solving two-player zero-sum games and yields a simplified framework for decision-time planning in two-player zero-sum games, void of the unappealing properties that plague existing decision-time planning approaches.

Vocal Bursts Valence Prediction

Human-AI Coordination via Human-Regularized Search and Learning

no code implementations11 Oct 2022 Hengyuan Hu, David J Wu, Adam Lerer, Jakob Foerster, Noam Brown

First, we show that our method outperforms experts when playing with a group of diverse human players in ad-hoc teams.

Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

1 code implementation11 Oct 2022 Anton Bakhtin, David J Wu, Adam Lerer, Jonathan Gray, Athul Paul Jacob, Gabriele Farina, Alexander H Miller, Noam Brown

We then show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL that provides a model of human play while simultaneously training an agent that responds well to this human model.

reinforcement-learning Reinforcement Learning (RL)

A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games

3 code implementations12 Jun 2022 Samuel Sokota, Ryan D'Orazio, J. Zico Kolter, Nicolas Loizou, Marc Lanctot, Ioannis Mitliagkas, Noam Brown, Christian Kroer

This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm.

MuJoCo Games reinforcement-learning +1

Modeling Strong and Human-Like Gameplay with KL-Regularized Search

no code implementations14 Dec 2021 Athul Paul Jacob, David J. Wu, Gabriele Farina, Adam Lerer, Hengyuan Hu, Anton Bakhtin, Jacob Andreas, Noam Brown

We consider the task of building strong but human-like policies in multi-agent decision-making problems, given examples of human behavior.

Imitation Learning

No-Press Diplomacy from Scratch

1 code implementation NeurIPS 2021 Anton Bakhtin, David Wu, Adam Lerer, Noam Brown

Additionally, we extend our methods to full-scale no-press Diplomacy and for the first time train an agent from scratch with no human data.

Starcraft

A Fine-Tuning Approach to Belief State Modeling

no code implementations ICLR 2022 Samuel Sokota, Hengyuan Hu, David J Wu, J Zico Kolter, Jakob Nicolaus Foerster, Noam Brown

Furthermore, because this specialization occurs after the action or policy has already been decided, BFT does not require the belief model to process it as input.

Learned Belief Search: Efficiently Improving Policies in Partially Observable Settings

no code implementations16 Jun 2021 Hengyuan Hu, Adam Lerer, Noam Brown, Jakob Foerster

Search is an important tool for computing effective policies in single- and multi-agent environments, and has been crucial for achieving superhuman performance in several benchmark fully and partially observable games.

counterfactual

Off-Belief Learning

5 code implementations6 Mar 2021 Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster

Policies learned through self-play may adopt arbitrary conventions and implicitly rely on multi-step reasoning based on fragile assumptions about other agents' actions and thus fail when paired with humans or independently trained agents at test time.

Safe Search for Stackelberg Equilibria in Extensive-Form Games

no code implementations2 Feb 2021 Chun Kai Ling, Noam Brown

Stackelberg equilibrium is a solution concept in two-player games where the leader has commitment rights over the follower.

Human-Level Performance in No-Press Diplomacy via Equilibrium Search

no code implementations ICLR 2021 Jonathan Gray, Adam Lerer, Anton Bakhtin, Noam Brown

Prior AI breakthroughs in complex games have focused on either the purely adversarial or purely cooperative settings.

Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

1 code implementation NeurIPS 2020 Noam Brown, Anton Bakhtin, Adam Lerer, Qucheng Gong

This paper presents ReBeL, a general framework for self-play reinforcement learning and search that provably converges to a Nash equilibrium in any two-player zero-sum game.

reinforcement-learning Reinforcement Learning (RL)

Unlocking the Potential of Deep Counterfactual Value Networks

no code implementations20 Jul 2020 Ryan Zarick, Bryan Pellegrino, Noam Brown, Caleb Banister

Deep counterfactual value networks combined with continual resolving provide a way to conduct depth-limited search in imperfect-information games.

counterfactual

DREAM: Deep Regret minimization with Advantage baselines and Model-free learning

1 code implementation18 Jun 2020 Eric Steinberger, Adam Lerer, Noam Brown

We introduce DREAM, a deep reinforcement learning algorithm that finds optimal strategies in imperfect-information games with multiple agents.

reinforcement-learning Reinforcement Learning (RL)

Improving Policies via Search in Cooperative Partially Observable Games

10 code implementations5 Dec 2019 Adam Lerer, Hengyuan Hu, Jakob Foerster, Noam Brown

The first one, single-agent search, effectively converts the problem into a single agent setting by making all but one of the agents play according to the agreed-upon policy.

Game of Hanabi

Stable-Predictive Optimistic Counterfactual Regret Minimization

no code implementations13 Feb 2019 Gabriele Farina, Christian Kroer, Noam Brown, Tuomas Sandholm

The CFR framework has been a powerful tool for solving large-scale extensive-form games in practice.

counterfactual

Deep Counterfactual Regret Minimization

4 code implementations1 Nov 2018 Noam Brown, Adam Lerer, Sam Gross, Tuomas Sandholm

This paper introduces Deep Counterfactual Regret Minimization, a form of CFR that obviates the need for abstraction by instead using deep neural networks to approximate the behavior of CFR in the full game.

counterfactual

Solving Imperfect-Information Games via Discounted Regret Minimization

3 code implementations11 Sep 2018 Noam Brown, Tuomas Sandholm

Counterfactual regret minimization (CFR) is a family of iterative algorithms that are the most popular and, in practice, fastest approach to approximately solving large imperfect-information games.

counterfactual

Depth-Limited Solving for Imperfect-Information Games

no code implementations NeurIPS 2018 Noam Brown, Tuomas Sandholm, Brandon Amos

This paper introduces a principled way to conduct depth-limited solving in imperfect-information games by allowing the opponent to choose among a number of strategies for the remainder of the game at the depth limit.

Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning

no code implementations ICML 2017 Noam Brown, Tuomas Sandholm

Iterative algorithms such as Counterfactual Regret Minimization (CFR) are the most popular way to solve large zero-sum imperfect-information games.

counterfactual

Safe and Nested Subgame Solving for Imperfect-Information Games

no code implementations NeurIPS 2017 Noam Brown, Tuomas Sandholm

Thus a subgame cannot be solved in isolation and must instead consider the strategy for the entire game as a whole, unlike perfect-information games.

Translation

Reduced Space and Faster Convergence in Imperfect-Information Games via Regret-Based Pruning

no code implementations ICML 2017 Noam Brown, Tuomas Sandholm

Counterfactual Regret Minimization (CFR) is the most popular iterative algorithm for solving zero-sum imperfect-information games.

counterfactual

Regret-Based Pruning in Extensive-Form Games

no code implementations NeurIPS 2015 Noam Brown, Tuomas Sandholm

CFR is an iterative algorithm that repeatedly traverses the game tree, updating regrets at each information set. We introduce an improvement to CFR that prunes any path of play in the tree, and its descendants, that has negative regret.

counterfactual

Cannot find the paper you are looking for? You can Submit a new open access paper.