Search Results for author: Tanner Fiez

Found 16 papers, 7 papers with code

Implicit Learning Dynamics in Stackelberg Games: Equilibria Characterization, Convergence Analysis, and Empirical Study

no code implementations ICML 2020 Tanner Fiez, Benjamin Chasnov, Lillian Ratliff

Contemporary work on learning in continuous games has commonly overlooked the hierarchical decision-making structure present in machine learning problems formulated as games, instead treating them as simultaneous play games and adopting the Nash equilibrium solution concept.

Decision Making

Best of Three Worlds: Adaptive Experimentation for Digital Marketing in Practice

no code implementations16 Feb 2024 Tanner Fiez, Houssam Nassif, Yu-cheng Chen, Sergio Gamez, Lalit Jain

Adaptive experimental design (AED) methods are increasingly being used in industry as a tool to boost testing throughput or reduce experimentation cost relative to traditional A/B/N testing methods.

counterfactual Counterfactual Inference +2

Neural Insights for Digital Marketing Content Design

no code implementations2 Feb 2023 Fanjie Kong, Yuan Li, Houssam Nassif, Tanner Fiez, Ricardo Henao, Shreya Chakrabarti

In digital marketing, experimenting with new website content is one of the key levers to improve customer engagement.

Marketing

Adaptive Experimental Design and Counterfactual Inference

no code implementations25 Oct 2022 Tanner Fiez, Sergio Gamez, Arick Chen, Houssam Nassif, Lalit Jain

Adaptive experimental design methods are increasingly being used in industry as a tool to boost testing throughput or reduce experimentation cost relative to traditional A/B/N testing methods.

counterfactual Counterfactual Inference +1

Global Convergence to Local Minmax Equilibrium in Classes of Nonconvex Zero-Sum Games

no code implementations NeurIPS 2021 Tanner Fiez, Lillian Ratliff, Eric Mazumdar, Evan Faulkner, Adhyyan Narang

For the class of nonconvex-PL zero-sum games, we exploit timescale separation to construct a potential function that when combined with the stability characterization and an asymptotic saddle avoidance result gives a global asymptotic almost-sure convergence guarantee to a set of the strict local minmax equilibrium.

Online Learning in Periodic Zero-Sum Games

no code implementations NeurIPS 2021 Tanner Fiez, Ryann Sim, Stratis Skoulakis, Georgios Piliouras, Lillian Ratliff

Classical learning results build on this theorem to show that online no-regret dynamics converge to an equilibrium in a time-average sense in zero-sum games.

Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning Algorithms

1 code implementation25 Sep 2021 Liyuan Zheng, Tanner Fiez, Zane Alumbaugh, Benjamin Chasnov, Lillian J. Ratliff

The hierarchical interaction between the actor and critic in actor-critic based reinforcement learning algorithms naturally lends itself to a game-theoretic interpretation.

OpenAI Gym reinforcement-learning +1

Minimax Optimization with Smooth Algorithmic Adversaries

1 code implementation ICLR 2022 Tanner Fiez, Chi Jin, Praneeth Netrapalli, Lillian J. Ratliff

This paper considers minimax optimization $\min_x \max_y f(x, y)$ in the challenging setting where $f$ can be both nonconvex in $x$ and nonconcave in $y$.

Global Convergence to Local Minmax Equilibrium in Classes of Nonconvex Zero-Sum Games

no code implementations NeurIPS 2021 Tanner Fiez, Lillian J Ratliff, Eric Mazumdar, Evan Faulkner, Adhyyan Narang

For the class of nonconvex-PL zero-sum games, we exploit timescale separation to construct a potential function that when combined with the stability characterization and an asymptotic saddle avoidance result gives a global asymptotic almost-sure convergence guarantee to a set of the strict local minmax equilibrium.

Evolutionary Game Theory Squared: Evolving Agents in Endogenously Evolving Zero-Sum Games

1 code implementation15 Dec 2020 Stratis Skoulakis, Tanner Fiez, Ryann Sim, Georgios Piliouras, Lillian Ratliff

The predominant paradigm in evolutionary game theory and more generally online learning in games is based on a clear distinction between a population of dynamic agents that interact given a fixed, static game.

Gradient Descent-Ascent Provably Converges to Strict Local Minmax Equilibria with a Finite Timescale Separation

1 code implementation ICLR 2021 Tanner Fiez, Lillian Ratliff

In this work, we bridge the gap between past work by showing there exists a finite timescale separation parameter $\tau^{\ast}$ such that $x^{\ast}$ is a stable critical point of gradient descent-ascent for all $\tau \in (\tau^{\ast}, \infty)$ if and only if it is a strict local minmax equilibrium.

A SUPER* Algorithm to Optimize Paper Bidding in Peer Review

1 code implementation27 Jun 2020 Tanner Fiez, Nihar B. Shah, Lillian Ratliff

Theoretically, we show a local optimality guarantee of our algorithm and prove that popular baselines are considerably suboptimal.

Sequential Experimental Design for Transductive Linear Bandits

1 code implementation NeurIPS 2019 Tanner Fiez, Lalit Jain, Kevin Jamieson, Lillian Ratliff

Such a transductive setting naturally arises when the set of measurement vectors is limited due to factors such as availability or cost.

Drug Discovery Experimental Design +1

Convergence of Learning Dynamics in Stackelberg Games

1 code implementation4 Jun 2019 Tanner Fiez, Benjamin Chasnov, Lillian J. Ratliff

Using this insight, we develop a gradient-based update for the leader while the follower employs a best response strategy for which each stable critical point is guaranteed to be a Stackelberg equilibrium in zero-sum games.

Combinatorial Bandits for Incentivizing Agents with Dynamic Preferences

no code implementations6 Jul 2018 Tanner Fiez, Shreyas Sekar, Liyuan Zheng, Lillian J. Ratliff

The design of personalized incentives or recommendations to improve user engagement is gaining prominence as digital platform providers continually emerge.

Multi-Armed Bandits for Correlated Markovian Environments with Smoothed Reward Feedback

no code implementations11 Mar 2018 Tanner Fiez, Shreyas Sekar, Lillian J. Ratliff

We analyze these algorithms under two types of smoothed reward feedback at the end of each epoch: a reward that is the discount-average of the discounted rewards within an epoch, and a reward that is the time-average of the rewards within an epoch.

Multi-Armed Bandits Q-Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.