no code implementations • 4 Nov 2024 • Ioannis Anagnostides, Alkis Kalavasis, Tuomas Sandholm
A celebrated connection in the interface of online learning and game theory establishes that players minimizing swap regret converge to correlated equilibria (CE) -- a seminal game-theoretic solution concept.
no code implementations • 4 Nov 2024 • Ioannis Anagnostides, Alkis Kalavasis, Tuomas Sandholm
A celebrated result in the interface of online learning and game theory guarantees that the repeated interaction of no-regret players leads to a coarse correlated equilibrium (CCE) -- a natural game-theoretic solution concept.
no code implementations • 22 Jul 2024 • Redha Taguelmimt, Samir Aknine, Djamila Boukredera, Narayan Changder, Tuomas Sandholm
In this paper, we present a novel algorithm, SMART, for the problem based on a hybridization of three innovative techniques.
no code implementations • 23 Jun 2024 • Emanuel Tewolde, Brian Hu Zhang, Caspar Oesterheld, Manolis Zampetakis, Tuomas Sandholm, Paul W. Goldberg, Vincent Conitzer
We investigate optimal decision making under imperfect recall, that is, when an agent forgets information it once held before.
no code implementations • 12 Jun 2024 • Carlos Martin, Tuomas Sandholm
A well-known family of approaches to planning at execution time are AlphaZero and its variants, which use Monte Carlo Tree Search together with a neural network that guides the search by predicting state values and action probabilities.
1 code implementation • 30 Jan 2024 • Paul Friedrich, Yulun Zhang, Michael Curry, Ludwig Dierks, Stephen Mcaleer, Jiaoyang Li, Tuomas Sandholm, Sven Seuken
Multi-Agent Path Finding (MAPF) involves determining paths for multiple agents to travel simultaneously and collision-free through a shared area toward given goal locations.
no code implementations • 19 Dec 2023 • Ioannis Anagnostides, Ioannis Panageas, Gabriele Farina, Tuomas Sandholm
Policy gradient methods enjoy strong practical performance in numerous tasks in reinforcement learning.
1 code implementation • 6 Oct 2023 • Ted Moskovitz, Aaditya K. Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D. Dragan, Stephen Mcaleer
Large language models are typically aligned with human preferences by optimizing $\textit{reward models}$ (RMs) fitted to human feedback.
no code implementations • 16 Aug 2023 • Carlos Martin, Tuomas Sandholm
However, in real-world environments, the model with respect to which the agent plans has been constrained to be grounded in the real environment itself, as opposed to a more abstract model which allows for planning over compound actions and behaviors.
no code implementations • 22 Jul 2023 • Yongyuan Liang, Yanchao Sun, Ruijie Zheng, Xiangyu Liu, Benjamin Eysenbach, Tuomas Sandholm, Furong Huang, Stephen Mcaleer
To tackle this challenge, we propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially observable two-player zero-sum game.
no code implementations • 20 Jan 2023 • Carlos Martin, Tuomas Sandholm
We propose two new methods that minimize an approximation of exploitability with respect to the strategy profile.
no code implementations • 29 Nov 2022 • Carlos Martin, Tuomas Sandholm
Being able to model such mixed strategies is crucial for tackling continuous-action games that lack pure-strategy equilibria.
no code implementations • 20 Aug 2022 • Ioannis Anagnostides, Gabriele Farina, Tuomas Sandholm
In this paper, we establish efficient and uncoupled learning dynamics so that, when employed by all players in multiplayer perfect-recall imperfect-information extensive-form games, the trigger regret of each player grows as $O(\log T)$ after $T$ repetitions of play.
no code implementations • 13 Jul 2022 • Stephen Mcaleer, JB Lanier, Kevin Wang, Pierre Baldi, Roy Fox, Tuomas Sandholm
Instead of adding only deterministic best responses to the opponent's least exploitable population mixture, SP-PSRO also learns an approximately optimal stochastic policy and adds it to the population as well.
no code implementations • 17 Jun 2022 • Gabriele Farina, Ioannis Anagnostides, Haipeng Luo, Chung-Wei Lee, Christian Kroer, Tuomas Sandholm
In this paper, we answer this in the positive by establishing the first uncoupled learning algorithm with $O(\log T)$ per-player regret in general \emph{convex games}, that is, games with concave utility functions supported on arbitrary convex and compact strategy sets.
1 code implementation • 8 Jun 2022 • Stephen Mcaleer, Gabriele Farina, Marc Lanctot, Tuomas Sandholm
DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains a neural network on an estimated regret target that can have extremely high variance due to an importance sampling term inherited from Monte Carlo CFR (MCCFR).
no code implementations • 25 Apr 2022 • Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Tuomas Sandholm
In this paper we establish efficient and \emph{uncoupled} learning dynamics so that, when employed by all players in a general-sum multiplayer game, the \emph{swap regret} of each player after $T$ repetitions of the game is bounded by $O(\log T)$, improving over the prior best bounds of $O(\log^4 (T))$.
no code implementations • 15 Apr 2022 • Maria-Florina Balcan, Siddharth Prasad, Tuomas Sandholm, Ellen Vitercik
These guarantees apply to infinite families of cutting planes, such as the family of Gomory mixed integer cuts, which are responsible for the main breakthrough speedups of integer programming solvers.
no code implementations • 14 Mar 2022 • Brian Zhang, Gabriele Farina, Andrea Celli, Tuomas Sandholm
We study the problem of finding optimal correlated equilibria of various sorts in extensive-form games: normal-form coarse correlated equilibrium (NFCCE), extensive-form coarse correlated equilibrium (EFCCE), and extensive-form correlated equilibrium (EFCE).
no code implementations • 6 Feb 2022 • Michael Curry, Tuomas Sandholm, John Dickerson
We present an architecture that supports multiple bidders and is perfectly strategyproof, but cannot necessarily represent the optimal mechanism.
no code implementations • 19 Jan 2022 • Stephen Mcaleer, Kevin Wang, John Lanier, Marc Lanctot, Pierre Baldi, Tuomas Sandholm, Roy Fox
PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next.
Multi-agent Reinforcement Learning reinforcement-learning +3
no code implementations • NeurIPS 2021 • Gabriele Farina, Tuomas Sandholm
In this paper, we initiate the study of equilibrium refinements for settings where one of the players is perfectly rational (the ``machine'') and the other may make mistakes.
no code implementations • 18 Nov 2021 • Maria-Florina Balcan, Siddharth Prasad, Tuomas Sandholm, Ellen Vitercik
If the training set is too small, a configuration may have good performance over the training set but poor performance on future integer programs.
no code implementations • 11 Nov 2021 • Ioannis Anagnostides, Constantinos Daskalakis, Gabriele Farina, Maxwell Fishelson, Noah Golowich, Tuomas Sandholm
Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS`21) showed that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(\textrm{polylog}(T))$ after $T$ repetitions of the game.
no code implementations • 29 Sep 2021 • Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Tuomas Sandholm
A recent emerging trend in the literature on learning in games has been concerned with providing accelerated learning dynamics for correlated and coarse correlated equilibria in normal-form games.
no code implementations • NeurIPS 2021 • Maria-Florina Balcan, Siddharth Prasad, Tuomas Sandholm, Ellen Vitercik
We first bound the sample complexity of learning cutting planes from the canonical family of Chv\'atal-Gomory cuts.
no code implementations • 27 May 2021 • Gabriele Farina, Christian Kroer, Tuomas Sandholm
The scaled extension operator is a way to recursively construct convex sets, which generalizes the decision polytope of extensive-form games, as well as the convex polytopes corresponding to correlated and team equilibria.
no code implementations • 8 Mar 2021 • Gabriele Farina, Robin Schmucker, Tuomas Sandholm
Tree-form sequential decision making (TFSDM) extends classical one-shot decision making by modeling tree-form interactions between an agent and a potentially adversarial environment.
no code implementations • 8 Mar 2021 • Gabriele Farina, Tuomas Sandholm
We give an efficient algorithm that achieves $O(T^{3/4})$ regret with high probability for that setting, even when the agent faces an adversarial environment.
no code implementations • 24 Dec 2020 • Maria-Florina Balcan, Tuomas Sandholm, Ellen Vitercik
This algorithm configuration procedure works by first selecting a portfolio of diverse algorithm parameter settings, and then, on a given problem instance, using an algorithm selector to choose a parameter setting from the portfolio with strong predicted performance.
1 code implementation • NeurIPS 2020 • Duncan C McElfresh, Michael Curry, Tuomas Sandholm, John P Dickerson
In barter exchanges, participants swap goods with one another without exchanging money; exchanges are often facilitated by a central clearinghouse, with the goal of maximizing the aggregate quality (or number) of swaps.
no code implementations • 21 Sep 2020 • Gabriele Farina, Andrea Celli, Nicola Gatti, Tuomas Sandholm
Second, we provide an algorithm that computes such an optimal distribution by only using profiles where only one of the team members gets to randomize in each profile.
no code implementations • NeurIPS 2020 • Gabriele Farina, Tuomas Sandholm
As of today, it is known that finding an optimal extensive-form correlated equilibrium (EFCE), extensive-form coarse correlated equilibrium (EFCCE), or normal-form coarse correlated equilibrium (NFCCE) in a two-player extensive-form game is computationally tractable when the game does not include chance moves, and intractable when the game involves chance moves.
no code implementations • 28 Jul 2020 • Gabriele Farina, Christian Kroer, Tuomas Sandholm
In spite of this prevalence, the regret matching (RM) and regret matching+ (RM+) algorithms have been preferred in the practice of solving large-scale games (as the local regret minimizers within the counterfactual regret minimization framework).
no code implementations • ICML 2020 • Maria-Florina Balcan, Tuomas Sandholm, Ellen Vitercik
We answer this question for algorithm configuration problems that exhibit a widely-applicable structure: the algorithm's performance as a function of its parameters can be approximated by a "simple" function.
no code implementations • ICML 2020 • Brian Hu Zhang, Tuomas Sandholm
Computational equilibrium finding in large zero-sum extensive-form imperfect-information games has led to significant recent AI breakthroughs.
no code implementations • 24 Feb 2020 • Carlos Martin, Tuomas Sandholm
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay, such as in financial or military simulations and computer games.
no code implementations • ICML 2020 • Gabriele Farina, Christian Kroer, Tuomas Sandholm
Our framework allows us to instantiate several new stochastic methods for solving sequential games.
no code implementations • NeurIPS 2019 • Gabriele Farina, Chun Kai Ling, Fei Fang, Tuomas Sandholm
We show that a regret minimizer can be designed for a scaled extension of any two convex sets, and that from the decomposition we then obtain a global regret minimizer.
no code implementations • NeurIPS 2019 • Gabriele Farina, Christian Kroer, Tuomas Sandholm
Our algorithms provably converge at a rate of $T^{-1}$, which is superior to prior counterfactual regret minimization algorithms.
no code implementations • 26 Aug 2019 • Gabriele Farina, Tommaso Bianchi, Tuomas Sandholm
Coarse correlation models strategic interactions of rational agents complemented by a correlation device, that is a mediator that can recommend behavior but not enforce it.
no code implementations • 8 Aug 2019 • Maria-Florina Balcan, Dan DeBlasio, Travis Dick, Carl Kingsford, Tuomas Sandholm, Ellen Vitercik
We provide a broadly applicable theory for deriving generalization guarantees that bound the difference between the algorithm's average performance over the training set and its expected performance.
no code implementations • 26 May 2019 • Maria-Florina Balcan, Tuomas Sandholm, Ellen Vitercik
Our algorithm can help compile a configuration portfolio, or it can be used to select the input to a configuration algorithm for finite parameter spaces.
no code implementations • 17 Feb 2019 • Christian Kroer, Tuomas Sandholm
We characterize the hardness of finding a Nash equilibrium or an optimal commitment strategy for either player, showing that in some of these variations the problem can be solved in polynomial time while in others it is PPAD-hard, NP-hard, or inapproximable.
no code implementations • 13 Feb 2019 • Gabriele Farina, Christian Kroer, Noam Brown, Tuomas Sandholm
The CFR framework has been a powerful tool for solving large-scale extensive-form games in practice.
no code implementations • NeurIPS 2018 • Gabriele Farina, Andrea Celli, Nicola Gatti, Tuomas Sandholm
This paper focuses on zero-sum games where a team of players faces an opponent, as is the case, for example, in Bridge, collusion in poker, and many non-recreational applications such as war, where the colluders do not have time or means of communicating during battle, collusion in bidding, where communication during the auction is illegal, and coordinated swindling in public.
no code implementations • NeurIPS 2018 • Christian Kroer, Tuomas Sandholm
In this paper we present a unified framework for analyzing abstractions that can express all types of abstractions and solution concepts used in prior papers with performance guarantees---while maintaining comparable bounds on abstraction quality.
no code implementations • NeurIPS 2018 • Gabriele Farina, Nicola Gatti, Tuomas Sandholm
Nash equilibrium strategies have the known weakness that they do not prescribe rational play in situations that are reached with zero probability according to the strategies themselves, for example, if players have made mistakes.
no code implementations • 6 Nov 2018 • Gabriele Farina, Christian Kroer, Tuomas Sandholm
We show that local regret minimizers for the simpler sets can be combined with additional regret minimizers into an aggregate regret minimizer for the composite set.
4 code implementations • 1 Nov 2018 • Noam Brown, Adam Lerer, Sam Gross, Tuomas Sandholm
This paper introduces Deep Counterfactual Regret Minimization, a form of CFR that obviates the need for abstraction by instead using deep neural networks to approximate the behavior of CFR in the full game.
no code implementations • NeurIPS 2018 • Christian Kroer, Gabriele Farina, Tuomas Sandholm
We present, to our knowledge, the first GPU implementation of a first-order method for extensive-form games.
3 code implementations • 11 Sep 2018 • Noam Brown, Tuomas Sandholm
Counterfactual regret minimization (CFR) is a family of iterative algorithms that are the most popular and, in practice, fastest approach to approximately solving large imperfect-information games.
no code implementations • 10 Sep 2018 • Gabriele Farina, Christian Kroer, Tuomas Sandholm
Experiments show that our framework leads to algorithms that scale at a rate comparable to the fastest variants of counterfactual regret minimization for computing Nash equilibrium, and therefore our approach leads to the first algorithm for computing quantal response equilibria in extremely large games.
no code implementations • NeurIPS 2018 • Noam Brown, Tuomas Sandholm, Brandon Amos
This paper introduces a principled way to conduct depth-limited solving in imperfect-information games by allowing the opponent to choose among a number of strategies for the remainder of the game at the depth limit.
no code implementations • ICML 2018 • Maria-Florina Balcan, Travis Dick, Tuomas Sandholm, Ellen Vitercik
Tree search algorithms recursively partition the search space to find an optimal solution.
no code implementations • 21 Nov 2017 • Christian Kroer, Gabriele Farina, Tuomas Sandholm
We then extend the program to the robust setting for Stackelberg equilibrium under unlimited and under limited lookahead by the opponent.
no code implementations • ICML 2017 • Gabriele Farina, Christian Kroer, Tuomas Sandholm
We use an instantiation of the CFR framework to develop algorithms for solving behaviorally-constrained (and, as a special case, perturbed in the Selten sense) extensive-form games, which allows us to compute approximate Nash equilibrium refinements.
no code implementations • ICML 2017 • Noam Brown, Tuomas Sandholm
Iterative algorithms such as Counterfactual Regret Minimization (CFR) are the most popular way to solve large zero-sum imperfect-information games.
no code implementations • 25 May 2017 • Gabriele Farina, John P. Dickerson, Tuomas Sandholm
A kidney exchange is a centrally-administered barter market where patients swap their willing yet incompatible donors.
no code implementations • NeurIPS 2017 • Noam Brown, Tuomas Sandholm
Thus a subgame cannot be solved in isolation and must instead consider the strategy for the entire game as a whole, unlike perfect-information games.
no code implementations • 29 Apr 2017 • Maria-Florina Balcan, Tuomas Sandholm, Ellen Vitercik
We study multi-item profit maximization when there is an underlying distribution over buyers' values.
no code implementations • 16 Feb 2017 • Christian Kroer, Kevin Waugh, Fatma Kilinc-Karzan, Tuomas Sandholm
By introducing a new weighting scheme for the dilated entropy function, we develop the first distance-generating function for the strategy spaces of sequential games that has no dependence on the branching factor of the player.
no code implementations • ICML 2017 • Noam Brown, Tuomas Sandholm
Counterfactual Regret Minimization (CFR) is the most popular iterative algorithm for solving zero-sum imperfect-information games.
no code implementations • NeurIPS 2016 • Maria-Florina Balcan, Tuomas Sandholm, Ellen Vitercik
In the traditional economic models, it is assumed that the bidders' valuations are drawn from an underlying distribution and that the auction designer has perfect knowledge of this distribution.
no code implementations • 6 Jun 2016 • John P. Dickerson, David F. Manlove, Benjamin Plaut, Tuomas Sandholm, James Trimble
The recent introduction of chains, where a donor without a paired patient triggers a sequence of donations without requiring a kidney in return, increased the efficacy of fielded kidney exchanges---while also dramatically raising the empirical computational hardness of clearing the market in practice.
no code implementations • 1 Jun 2016 • Benjamin Plaut, John P. Dickerson, Tuomas Sandholm
One of the leading techniques has been branch and price, where column generation is used to incrementally bring cycles and chains into the optimization model on an as-needed basis.
no code implementations • 25 May 2016 • John P. Dickerson, Aleksandr M. Kazachkov, Ariel D. Procaccia, Tuomas Sandholm
This growth results in more lives saved, but exacerbates the empirical hardness of the $\mathcal{NP}$-complete problem of optimally matching patients to donors.
no code implementations • NeurIPS 2015 • Noam Brown, Tuomas Sandholm
CFR is an iterative algorithm that repeatedly traverses the game tree, updating regrets at each information set. We introduce an improvement to CFR that prunes any path of play in the tree, and its descendants, that has negative regret.
no code implementations • NeurIPS 2014 • Albert Jiang, Leandro Soriano Marcolino, Ariel D. Procaccia, Tuomas Sandholm, Nisarg Shah, Milind Tambe
We investigate the power of voting among diverse, randomized software agents.
no code implementations • 16 Jan 2014 • Michael Benisch, George B. Davis, Tuomas Sandholm
This algorithm serves as a subroutine in a series of polynomial-time algorithms for finding all minimal CURB sets, one minimal CURB set, and the smallest minimal CURB set in a game.