Search Results for author: Samuel Sokota

Found 14 papers, 5 papers with code

The Update-Equivalence Framework for Decision-Time Planning

no code implementations • 25 Apr 2023 • Samuel Sokota, Gabriele Farina, David J. Wu, Hengyuan Hu, Kevin A. Wang, J. Zico Kolter, Noam Brown

Using this framework, we derive a provably sound search algorithm for fully cooperative games based on mirror descent and a search algorithm for adversarial games based on magnetic mirror descent.

Paper
Add Code

Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning

no code implementations • 19 Mar 2023 • Yat Long Lo, Christian Schroeder de Witt, Samuel Sokota, Jakob Nicolaus Foerster, Shimon Whiteson

By enabling agents to communicate, recent cooperative multi-agent reinforcement learning (MARL) methods have demonstrated better task performance and more coordinated behavior.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Abstracting Imperfect Information Away from Two-Player Zero-Sum Games

no code implementations • 22 Jan 2023 • Samuel Sokota, Ryan D'Orazio, Chun Kai Ling, David J. Wu, J. Zico Kolter, Noam Brown

Because these regularized equilibria can be made arbitrarily close to Nash equilibria, our result opens the door to a new perspective to solving two-player zero-sum games and yields a simplified framework for decision-time planning in two-player zero-sum games, void of the unappealing properties that plague existing decision-time planning approaches.

Vocal Bursts Valence Prediction

Paper
Add Code

Perfectly Secure Steganography Using Minimum Entropy Coupling

1 code implementation • 24 Oct 2022 • Christian Schroeder de Witt, Samuel Sokota, J. Zico Kolter, Jakob Foerster, Martin Strohmeier

Steganography is the practice of encoding secret information into innocuous content in such a manner that an adversarial third party would not realize that there is hidden meaning.

Paper
Code

Self-Explaining Deviations for Coordination

no code implementations • 13 Jul 2022 • Hengyuan Hu, Samuel Sokota, David Wu, Anton Bakhtin, Andrei Lupu, Brandon Cui, Jakob N. Foerster

Fully cooperative, partially observable multi-agent problems are ubiquitous in the real world.

Paper
Add Code

A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games

3 code implementations • 12 Jun 2022 • Samuel Sokota, Ryan D'Orazio, J. Zico Kolter, Nicolas Loizou, Marc Lanctot, Ioannis Mitliagkas, Noam Brown, Christian Kroer

This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm.

MuJoCo Games reinforcement-learning +1

3,989

Paper
Code

Learning Intuitive Policies Using Action Features

no code implementations • 29 Jan 2022 • Mingwei Ma, Jizhou Liu, Samuel Sokota, Max Kleiman-Weiner, Jakob Foerster

An unaddressed challenge in multi-agent coordination is to enable AI agents to exploit the semantic relationships between the features of actions and the features of observations.

Inductive Bias

Paper
Add Code

Monte Carlo Tree Search With Iteratively Refining State Abstractions

no code implementations • NeurIPS 2021 • Samuel Sokota, Caleb Ho, Zaheen Ahmad, J. Zico Kolter

In this work, we present a method, called abstraction refining, for extending MCTS to stochastic environments which, unlike progressive widening, leverages the geometry of the state space.

Paper
Add Code

A Fine-Tuning Approach to Belief State Modeling

no code implementations • ICLR 2022 • Samuel Sokota, Hengyuan Hu, David J Wu, J Zico Kolter, Jakob Nicolaus Foerster, Noam Brown

Furthermore, because this specialization occurs after the action or policy has already been decided, BFT does not require the belief model to process it as input.

Paper
Add Code

Zero-Shot Coordination via Semantic Relationships Between Actions and Observations

no code implementations • 29 Sep 2021 • Mingwei Ma, Jizhou Liu, Samuel Sokota, Max Kleiman-Weiner, Jakob Nicolaus Foerster

An unaddressed challenge in zero-shot coordination is to take advantage of the semantic relationship between the features of an action and the features of observations.

Inductive Bias

Paper
Add Code

Communicating via Markov Decision Processes

1 code implementation • 17 Jul 2021 • Samuel Sokota, Christian Schroeder de Witt, Maximilian Igl, Luisa Zintgraf, Philip Torr, Martin Strohmeier, J. Zico Kolter, Shimon Whiteson, Jakob Foerster

We contribute a theoretically grounded approach to MCGs based on maximum entropy reinforcement learning and minimum entropy coupling that we call MEME.

Multi-agent Reinforcement Learning

Paper
Code

Solving Common-Payoff Games with Approximate Policy Iteration

2 code implementations • 11 Jan 2021 • Samuel Sokota, Edward Lockhart, Finbarr Timbers, Elnaz Davoodi, Ryan D'Orazio, Neil Burch, Martin Schmid, Michael Bowling, Marc Lanctot

While this choice precludes CAPI from scaling to games as large as Hanabi, empirical results demonstrate that, on the games to which CAPI does scale, it is capable of discovering optimal joint policies even when other modern multi-agent reinforcement learning algorithms are unable to do so.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Code

Selective Dyna-style Planning Under Limited Model Capacity

no code implementations • ICML 2020 • Zaheer Abbas, Samuel Sokota, Erin J. Talvitie, Martha White

We show that heteroscedastic regression can signal predictive uncertainty arising from model inadequacy that is complementary to that which is detected by methods designed for parameter uncertainty, indicating that considering both parameter uncertainty and model inadequacy may be a more promising direction for effective selective planning than either in isolation.

Model-based Reinforcement Learning

Paper
Add Code

Simultaneous Prediction Intervals for Patient-Specific Survival Curves

1 code implementation • 25 Jun 2019 • Samuel Sokota, Ryan D'Orazio, Khurram Javed, Humza Haider, Russell Greiner

In this paper, we demonstrate that an existing method for estimating simultaneous prediction intervals from samples can easily be adapted for patient-specific survival curve analysis and yields accurate results.

Descriptive Prediction Intervals +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.