You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • ICML 2020 • Tanner Fiez, Benjamin Chasnov, Lillian Ratliff

Contemporary work on learning in continuous games has commonly overlooked the hierarchical decision-making structure present in machine learning problems formulated as games, instead treating them as simultaneous play games and adopting the Nash equilibrium solution concept.

no code implementations • NeurIPS 2021 • Tanner Fiez, Lillian Ratliff, Eric Mazumdar, Evan Faulkner, Adhyyan Narang

For the class of nonconvex-PL zero-sum games, we exploit timescale separation to construct a potential function that when combined with the stability characterization and an asymptotic saddle avoidance result gives a global asymptotic almost-sure convergence guarantee to a set of the strict local minmax equilibrium.

no code implementations • NeurIPS 2021 • Tanner Fiez, Ryann Sim, Stratis Skoulakis, Georgios Piliouras, Lillian Ratliff

Classical learning results build on this theorem to show that online no-regret dynamics converge to an equilibrium in a time-average sense in zero-sum games.

1 code implementation • 25 Sep 2021 • Liyuan Zheng, Tanner Fiez, Zane Alumbaugh, Benjamin Chasnov, Lillian J. Ratliff

The hierarchical interaction between the actor and critic in actor-critic based reinforcement learning algorithms naturally lends itself to a game-theoretic interpretation.

1 code implementation • ICLR 2022 • Tanner Fiez, Chi Jin, Praneeth Netrapalli, Lillian J. Ratliff

This paper considers minimax optimization $\min_x \max_y f(x, y)$ in the challenging setting where $f$ can be both nonconvex in $x$ and nonconcave in $y$.

no code implementations • NeurIPS 2021 • Tanner Fiez, Lillian J Ratliff, Eric Mazumdar, Evan Faulkner, Adhyyan Narang

For the class of nonconvex-PL zero-sum games, we exploit timescale separation to construct a potential function that when combined with the stability characterization and an asymptotic saddle avoidance result gives a global asymptotic almost-sure convergence guarantee to a set of the strict local minmax equilibrium.

1 code implementation • 15 Dec 2020 • Stratis Skoulakis, Tanner Fiez, Ryann Sim, Georgios Piliouras, Lillian Ratliff

The predominant paradigm in evolutionary game theory and more generally online learning in games is based on a clear distinction between a population of dynamic agents that interact given a fixed, static game.

1 code implementation • ICLR 2021 • Tanner Fiez, Lillian Ratliff

In this work, we bridge the gap between past work by showing there exists a finite timescale separation parameter $\tau^{\ast}$ such that $x^{\ast}$ is a stable critical point of gradient descent-ascent for all $\tau \in (\tau^{\ast}, \infty)$ if and only if it is a strict local minmax equilibrium.

1 code implementation • 27 Jun 2020 • Tanner Fiez, Nihar B. Shah, Lillian Ratliff

Theoretically, we show a local optimality guarantee of our algorithm and prove that popular baselines are considerably suboptimal.

1 code implementation • NeurIPS 2019 • Tanner Fiez, Lalit Jain, Kevin Jamieson, Lillian Ratliff

Such a transductive setting naturally arises when the set of measurement vectors is limited due to factors such as availability or cost.

1 code implementation • 4 Jun 2019 • Tanner Fiez, Benjamin Chasnov, Lillian J. Ratliff

Using this insight, we develop a gradient-based update for the leader while the follower employs a best response strategy for which each stable critical point is guaranteed to be a Stackelberg equilibrium in zero-sum games.

no code implementations • 6 Jul 2018 • Tanner Fiez, Shreyas Sekar, Liyuan Zheng, Lillian J. Ratliff

The design of personalized incentives or recommendations to improve user engagement is gaining prominence as digital platform providers continually emerge.

no code implementations • 11 Mar 2018 • Tanner Fiez, Shreyas Sekar, Lillian J. Ratliff

We analyze these algorithms under two types of smoothed reward feedback at the end of each epoch: a reward that is the discount-average of the discounted rewards within an epoch, and a reward that is the time-average of the rewards within an epoch.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.