Search Results for author: Tuomas Sandholm

Found 65 papers, 5 papers with code

Algorithms for Closed Under Rational Behavior (CURB) Sets

no code implementations16 Jan 2014 Michael Benisch, George B. Davis, Tuomas Sandholm

This algorithm serves as a subroutine in a series of polynomial-time algorithms for finding all minimal CURB sets, one minimal CURB set, and the smallest minimal CURB set in a game.

Regret-Based Pruning in Extensive-Form Games

no code implementations NeurIPS 2015 Noam Brown, Tuomas Sandholm

CFR is an iterative algorithm that repeatedly traverses the game tree, updating regrets at each information set. We introduce an improvement to CFR that prunes any path of play in the tree, and its descendants, that has negative regret.

counterfactual

Small Representations of Big Kidney Exchange Graphs

no code implementations25 May 2016 John P. Dickerson, Aleksandr M. Kazachkov, Ariel D. Procaccia, Tuomas Sandholm

This growth results in more lives saved, but exacerbates the empirical hardness of the $\mathcal{NP}$-complete problem of optimally matching patients to donors.

Hardness of the Pricing Problem for Chains in Barter Exchanges

no code implementations1 Jun 2016 Benjamin Plaut, John P. Dickerson, Tuomas Sandholm

One of the leading techniques has been branch and price, where column generation is used to incrementally bring cycles and chains into the optimization model on an as-needed basis.

Position-Indexed Formulations for Kidney Exchange

no code implementations6 Jun 2016 John P. Dickerson, David F. Manlove, Benjamin Plaut, Tuomas Sandholm, James Trimble

The recent introduction of chains, where a donor without a paired patient triggers a sequence of donations without requiring a kidney in return, increased the efficacy of fielded kidney exchanges---while also dramatically raising the empirical computational hardness of clearing the market in practice.

Position

Sample Complexity of Automated Mechanism Design

no code implementations NeurIPS 2016 Maria-Florina Balcan, Tuomas Sandholm, Ellen Vitercik

In the traditional economic models, it is assumed that the bidders' valuations are drawn from an underlying distribution and that the auction designer has perfect knowledge of this distribution.

Combinatorial Optimization Learning Theory

Reduced Space and Faster Convergence in Imperfect-Information Games via Regret-Based Pruning

no code implementations ICML 2017 Noam Brown, Tuomas Sandholm

Counterfactual Regret Minimization (CFR) is the most popular iterative algorithm for solving zero-sum imperfect-information games.

counterfactual

Theoretical and Practical Advances on Smoothing for Extensive-Form Games

no code implementations16 Feb 2017 Christian Kroer, Kevin Waugh, Fatma Kilinc-Karzan, Tuomas Sandholm

By introducing a new weighting scheme for the dilated entropy function, we develop the first distance-generating function for the strategy spaces of sequential games that has no dependence on the branching factor of the player.

counterfactual

Safe and Nested Subgame Solving for Imperfect-Information Games

no code implementations NeurIPS 2017 Noam Brown, Tuomas Sandholm

Thus a subgame cannot be solved in isolation and must instead consider the strategy for the entire game as a whole, unlike perfect-information games.

Translation

Operation Frames and Clubs in Kidney Exchange

no code implementations25 May 2017 Gabriele Farina, John P. Dickerson, Tuomas Sandholm

A kidney exchange is a centrally-administered barter market where patients swap their willing yet incompatible donors.

Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning

no code implementations ICML 2017 Noam Brown, Tuomas Sandholm

Iterative algorithms such as Counterfactual Regret Minimization (CFR) are the most popular way to solve large zero-sum imperfect-information games.

counterfactual

Regret Minimization in Behaviorally-Constrained Zero-Sum Games

no code implementations ICML 2017 Gabriele Farina, Christian Kroer, Tuomas Sandholm

We use an instantiation of the CFR framework to develop algorithms for solving behaviorally-constrained (and, as a special case, perturbed in the Selten sense) extensive-form games, which allows us to compute approximate Nash equilibrium refinements.

counterfactual

Robust Stackelberg Equilibria in Extensive-Form Games and Extension to Limited Lookahead

no code implementations21 Nov 2017 Christian Kroer, Gabriele Farina, Tuomas Sandholm

We then extend the program to the robust setting for Stackelberg equilibrium under unlimited and under limited lookahead by the opponent.

Learning to Branch

no code implementations ICML 2018 Maria-Florina Balcan, Travis Dick, Tuomas Sandholm, Ellen Vitercik

Tree search algorithms recursively partition the search space to find an optimal solution.

Variable Selection

Depth-Limited Solving for Imperfect-Information Games

no code implementations NeurIPS 2018 Noam Brown, Tuomas Sandholm, Brandon Amos

This paper introduces a principled way to conduct depth-limited solving in imperfect-information games by allowing the opponent to choose among a number of strategies for the remainder of the game at the depth limit.

Online Convex Optimization for Sequential Decision Processes and Extensive-Form Games

no code implementations10 Sep 2018 Gabriele Farina, Christian Kroer, Tuomas Sandholm

Experiments show that our framework leads to algorithms that scale at a rate comparable to the fastest variants of counterfactual regret minimization for computing Nash equilibrium, and therefore our approach leads to the first algorithm for computing quantal response equilibria in extremely large games.

counterfactual Decision Making

Solving Imperfect-Information Games via Discounted Regret Minimization

3 code implementations11 Sep 2018 Noam Brown, Tuomas Sandholm

Counterfactual regret minimization (CFR) is a family of iterative algorithms that are the most popular and, in practice, fastest approach to approximately solving large imperfect-information games.

counterfactual

Solving Large Sequential Games with the Excessive Gap Technique

no code implementations NeurIPS 2018 Christian Kroer, Gabriele Farina, Tuomas Sandholm

We present, to our knowledge, the first GPU implementation of a first-order method for extensive-form games.

counterfactual

Deep Counterfactual Regret Minimization

4 code implementations1 Nov 2018 Noam Brown, Adam Lerer, Sam Gross, Tuomas Sandholm

This paper introduces Deep Counterfactual Regret Minimization, a form of CFR that obviates the need for abstraction by instead using deep neural networks to approximate the behavior of CFR in the full game.

counterfactual

Regret Circuits: Composability of Regret Minimizers

no code implementations6 Nov 2018 Gabriele Farina, Christian Kroer, Tuomas Sandholm

We show that local regret minimizers for the simpler sets can be combined with additional regret minimizers into an aggregate regret minimizer for the composite set.

A Unified Framework for Extensive-Form Game Abstraction with Bounds

no code implementations NeurIPS 2018 Christian Kroer, Tuomas Sandholm

In this paper we present a unified framework for analyzing abstractions that can express all types of abstractions and solution concepts used in prior papers with performance guarantees---while maintaining comparable bounds on abstraction quality.

Practical exact algorithm for trembling-hand equilibrium refinements in games

no code implementations NeurIPS 2018 Gabriele Farina, Nicola Gatti, Tuomas Sandholm

Nash equilibrium strategies have the known weakness that they do not prescribe rational play in situations that are reached with zero probability according to the strategies themselves, for example, if players have made mistakes.

Ex ante coordination and collusion in zero-sum multi-player extensive-form games

no code implementations NeurIPS 2018 Gabriele Farina, Andrea Celli, Nicola Gatti, Tuomas Sandholm

This paper focuses on zero-sum games where a team of players faces an opponent, as is the case, for example, in Bridge, collusion in poker, and many non-recreational applications such as war, where the colluders do not have time or means of communicating during battle, collusion in bidding, where communication during the auction is illegal, and coordinated swindling in public.

Stable-Predictive Optimistic Counterfactual Regret Minimization

no code implementations13 Feb 2019 Gabriele Farina, Christian Kroer, Noam Brown, Tuomas Sandholm

The CFR framework has been a powerful tool for solving large-scale extensive-form games in practice.

counterfactual

Limited Lookahead in Imperfect-Information Games

no code implementations17 Feb 2019 Christian Kroer, Tuomas Sandholm

We characterize the hardness of finding a Nash equilibrium or an optimal commitment strategy for either player, showing that in some of these variations the problem can be solved in polynomial time while in others it is PPAD-hard, NP-hard, or inapproximable.

Learning to Optimize Computational Resources: Frugal Training with Generalization Guarantees

no code implementations26 May 2019 Maria-Florina Balcan, Tuomas Sandholm, Ellen Vitercik

Our algorithm can help compile a configuration portfolio, or it can be used to select the input to a configuration algorithm for finite parameter spaces.

Clustering

How much data is sufficient to learn high-performing algorithms? Generalization guarantees for data-driven algorithm design

no code implementations8 Aug 2019 Maria-Florina Balcan, Dan DeBlasio, Travis Dick, Carl Kingsford, Tuomas Sandholm, Ellen Vitercik

We provide a broadly applicable theory for deriving generalization guarantees that bound the difference between the algorithm's average performance over the training set and its expected performance.

Clustering Generalization Bounds

Coarse Correlation in Extensive-Form Games

no code implementations26 Aug 2019 Gabriele Farina, Tommaso Bianchi, Tuomas Sandholm

Coarse correlation models strategic interactions of rational agents complemented by a correlation device, that is a mediator that can recommend behavior but not enforce it.

Optimistic Regret Minimization for Extensive-Form Games via Dilated Distance-Generating Functions

no code implementations NeurIPS 2019 Gabriele Farina, Christian Kroer, Tuomas Sandholm

Our algorithms provably converge at a rate of $T^{-1}$, which is superior to prior counterfactual regret minimization algorithms.

counterfactual

Efficient Regret Minimization Algorithm for Extensive-Form Correlated Equilibrium

no code implementations NeurIPS 2019 Gabriele Farina, Chun Kai Ling, Fei Fang, Tuomas Sandholm

We show that a regret minimizer can be designed for a scaled extension of any two convex sets, and that from the decomposition we then obtain a global regret minimizer.

Stochastic Regret Minimization in Extensive-Form Games

no code implementations ICML 2020 Gabriele Farina, Christian Kroer, Tuomas Sandholm

Our framework allows us to instantiate several new stochastic methods for solving sequential games.

counterfactual

Efficient exploration of zero-sum stochastic games

no code implementations24 Feb 2020 Carlos Martin, Tuomas Sandholm

We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay, such as in financial or military simulations and computer games.

Efficient Exploration Thompson Sampling

Sparsified Linear Programming for Zero-Sum Equilibrium Finding

no code implementations ICML 2020 Brian Hu Zhang, Tuomas Sandholm

Computational equilibrium finding in large zero-sum extensive-form imperfect-information games has led to significant recent AI breakthroughs.

counterfactual

Refined bounds for algorithm configuration: The knife-edge of dual class approximability

no code implementations ICML 2020 Maria-Florina Balcan, Tuomas Sandholm, Ellen Vitercik

We answer this question for algorithm configuration problems that exhibit a widely-applicable structure: the algorithm's performance as a function of its parameters can be approximated by a "simple" function.

Faster Game Solving via Predictive Blackwell Approachability: Connecting Regret Matching and Mirror Descent

no code implementations28 Jul 2020 Gabriele Farina, Christian Kroer, Tuomas Sandholm

In spite of this prevalence, the regret matching (RM) and regret matching+ (RM+) algorithms have been preferred in the practice of solving large-scale games (as the local regret minimizers within the counterfactual regret minimization framework).

counterfactual

Polynomial-Time Computation of Optimal Correlated Equilibria in Two-Player Extensive-Form Games with Public Chance Moves and Beyond

no code implementations NeurIPS 2020 Gabriele Farina, Tuomas Sandholm

As of today, it is known that finding an optimal extensive-form correlated equilibrium (EFCE), extensive-form coarse correlated equilibrium (EFCCE), or normal-form coarse correlated equilibrium (NFCCE) in a two-player extensive-form game is computationally tractable when the game does not include chance moves, and intractable when the game involves chance moves.

Faster Algorithms for Optimal Ex-Ante Coordinated Collusive Strategies in Extensive-Form Zero-Sum Games

no code implementations21 Sep 2020 Gabriele Farina, Andrea Celli, Nicola Gatti, Tuomas Sandholm

Second, we provide an algorithm that computes such an optimal distribution by only using profiles where only one of the team members gets to randomize in each profile.

Improving Policy-Constrained Kidney Exchange via Pre-Screening

1 code implementation NeurIPS 2020 Duncan C McElfresh, Michael Curry, Tuomas Sandholm, John P Dickerson

In barter exchanges, participants swap goods with one another without exchanging money; exchanges are often facilitated by a central clearinghouse, with the goal of maximizing the aggregate quality (or number) of swaps.

Generalization in portfolio-based algorithm selection

no code implementations24 Dec 2020 Maria-Florina Balcan, Tuomas Sandholm, Ellen Vitercik

This algorithm configuration procedure works by first selecting a portfolio of diverse algorithm parameter settings, and then, on a given problem instance, using an algorithm selector to choose a parameter setting from the portfolio with strong predicted performance.

Bandit Linear Optimization for Sequential Decision Making and Extensive-Form Games

no code implementations8 Mar 2021 Gabriele Farina, Robin Schmucker, Tuomas Sandholm

Tree-form sequential decision making (TFSDM) extends classical one-shot decision making by modeling tree-form interactions between an agent and a potentially adversarial environment.

counterfactual Decision Making

Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games

no code implementations8 Mar 2021 Gabriele Farina, Tuomas Sandholm

We give an efficient algorithm that achieves $O(T^{3/4})$ regret with high probability for that setting, even when the agent faces an adversarial environment.

counterfactual Decision Making

Better Regularization for Sequential Decision Spaces: Fast Convergence Rates for Nash, Correlated, and Team Equilibria

no code implementations27 May 2021 Gabriele Farina, Christian Kroer, Tuomas Sandholm

The scaled extension operator is a way to recursively construct convex sets, which generalizes the decision polytope of extensive-form games, as well as the convex polytopes corresponding to correlated and team equilibria.

Faster No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

no code implementations29 Sep 2021 Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Tuomas Sandholm

A recent emerging trend in the literature on learning in games has been concerned with providing accelerated learning dynamics for correlated and coarse correlated equilibria in normal-form games.

Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games

no code implementations11 Nov 2021 Ioannis Anagnostides, Constantinos Daskalakis, Gabriele Farina, Maxwell Fishelson, Noah Golowich, Tuomas Sandholm

Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS`21) showed that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(\textrm{polylog}(T))$ after $T$ repetitions of the game.

Improved Sample Complexity Bounds for Branch-and-Cut

no code implementations18 Nov 2021 Maria-Florina Balcan, Siddharth Prasad, Tuomas Sandholm, Ellen Vitercik

If the training set is too small, a configuration may have good performance over the training set but poor performance on future integer programs.

Equilibrium Refinement for the Age of Machines: The One-Sided Quasi-Perfect Equilibrium

no code implementations NeurIPS 2021 Gabriele Farina, Tuomas Sandholm

In this paper, we initiate the study of equilibrium refinements for settings where one of the players is perfectly rational (the ``machine'') and the other may make mistakes.

Anytime PSRO for Two-Player Zero-Sum Games

no code implementations19 Jan 2022 Stephen Mcaleer, Kevin Wang, John Lanier, Marc Lanctot, Pierre Baldi, Tuomas Sandholm, Roy Fox

PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next.

Multi-agent Reinforcement Learning reinforcement-learning +2

Differentiable Economics for Randomized Affine Maximizer Auctions

no code implementations6 Feb 2022 Michael Curry, Tuomas Sandholm, John Dickerson

We present an architecture that supports multiple bidders and is perfectly strategyproof, but cannot necessarily represent the optimal mechanism.

Optimal Correlated Equilibria in General-Sum Extensive-Form Games: Fixed-Parameter Algorithms, Hardness, and Two-Sided Column-Generation

no code implementations14 Mar 2022 Brian Zhang, Gabriele Farina, Andrea Celli, Tuomas Sandholm

For team games, the two-sided column generation approach vastly outperforms standard column generation approaches, making it the state of the art algorithm when the parameter is large.

Structural Analysis of Branch-and-Cut and the Learnability of Gomory Mixed Integer Cuts

no code implementations15 Apr 2022 Maria-Florina Balcan, Siddharth Prasad, Tuomas Sandholm, Ellen Vitercik

These guarantees apply to infinite families of cutting planes, such as the family of Gomory mixed integer cuts, which are responsible for the main breakthrough speedups of integer programming solvers.

BIG-bench Machine Learning

Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games

no code implementations25 Apr 2022 Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Tuomas Sandholm

In this paper we establish efficient and \emph{uncoupled} learning dynamics so that, when employed by all players in a general-sum multiplayer game, the \emph{swap regret} of each player after $T$ repetitions of the game is bounded by $O(\log T)$, improving over the prior best bounds of $O(\log^4 (T))$.

ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret

1 code implementation8 Jun 2022 Stephen Mcaleer, Gabriele Farina, Marc Lanctot, Tuomas Sandholm

DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains a neural network on an estimated regret target that can have extremely high variance due to an importance sampling term inherited from Monte Carlo CFR (MCCFR).

counterfactual

Near-Optimal No-Regret Learning Dynamics for General Convex Games

no code implementations17 Jun 2022 Gabriele Farina, Ioannis Anagnostides, Haipeng Luo, Chung-Wei Lee, Christian Kroer, Tuomas Sandholm

In this paper, we answer this in the positive by establishing the first uncoupled learning algorithm with $O(\log T)$ per-player regret in general \emph{convex games}, that is, games with concave utility functions supported on arbitrary convex and compact strategy sets.

Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

no code implementations13 Jul 2022 Stephen Mcaleer, JB Lanier, Kevin Wang, Pierre Baldi, Roy Fox, Tuomas Sandholm

Instead of adding only deterministic best responses to the opponent's least exploitable population mixture, SP-PSRO also learns an approximately optimal stochastic policy and adds it to the population as well.

Reinforcement Learning (RL)

Near-Optimal $Φ$-Regret Learning in Extensive-Form Games

no code implementations20 Aug 2022 Ioannis Anagnostides, Gabriele Farina, Tuomas Sandholm

In this paper, we establish efficient and uncoupled learning dynamics so that, when employed by all players in multiplayer perfect-recall imperfect-information extensive-form games, the trigger regret of each player grows as $O(\log T)$ after $T$ repetitions of play.

Open-Ended Question Answering

Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks

no code implementations29 Nov 2022 Carlos Martin, Tuomas Sandholm

Being able to model such mixed strategies is crucial for tackling continuous-action games that lack pure-strategy equilibria.

Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations

no code implementations22 Jul 2023 Yongyuan Liang, Yanchao Sun, Ruijie Zheng, Xiangyu Liu, Tuomas Sandholm, Furong Huang, Stephen Mcaleer

To tackle this challenge, we propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially-observable two-player zero-sum game.

Continuous Control reinforcement-learning +1

AI planning in the imagination: High-level planning on learned abstract search spaces

no code implementations16 Aug 2023 Carlos Martin, Tuomas Sandholm

However, in real-world environments, the model with respect to which the agent plans has been constrained to be grounded in the real environment itself, as opposed to a more abstract model which allows for planning over compound actions and behaviors.

Traveling Salesman Problem

Confronting Reward Model Overoptimization with Constrained RLHF

1 code implementation6 Oct 2023 Ted Moskovitz, Aaditya K. Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D. Dragan, Stephen Mcaleer

Large language models are typically aligned with human preferences by optimizing $\textit{reward models}$ (RMs) fitted to human feedback.

Scalable Mechanism Design for Multi-Agent Path Finding

no code implementations30 Jan 2024 Paul Friedrich, Yulun Zhang, Michael Curry, Ludwig Dierks, Stephen Mcaleer, Jiaoyang Li, Tuomas Sandholm, Sven Seuken

In this work, we introduce the problem of scalable mechanism design for MAPF and propose three strategyproof mechanisms, two of which even use approximate MAPF algorithms.

Multi-Agent Path Finding

Cannot find the paper you are looking for? You can Submit a new open access paper.