Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms

no code implementations15 Jun 2024 Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng

While both algorithms enjoy $O(1/T)$ ergodic convergence to Nash equilibrium in two-player zero-sum games, OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix and $\widetilde{O}(1/T)$ convergence to coarse correlated equilibria even in general-sum games.

Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games

no code implementations1 Nov 2023 Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng

Algorithms based on regret matching, specifically regret matching$^+$ (RM$^+$), and its variants are the most popular approaches for solving large-scale two-player zero-sum games in practice.

Practical Knowledge Distillation: Using DNNs to Beat DNNs

no code implementations23 Feb 2023 Chung-Wei Lee, Pavlos Athanasios Apostolopulos, Igor L. Markov

While gradient boosting is known to outperform DNNs on tabular data, we close the gap for datasets with 100K+ rows and give DNNs an advantage on small data sets.

Denoising Knowledge Distillation

Near-Optimal No-Regret Learning Dynamics for General Convex Games

no code implementations17 Jun 2022 Gabriele Farina, Ioannis Anagnostides, Haipeng Luo, Chung-Wei Lee, Christian Kroer, Tuomas Sandholm

In this paper, we answer this in the positive by establishing the first uncoupled learning algorithm with $O(\log T)$ per-player regret in general \emph{convex games}, that is, games with concave utility functions supported on arbitrary convex and compact strategy sets.

Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games

no code implementations25 Apr 2022 Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Tuomas Sandholm

In this paper we establish efficient and \emph{uncoupled} learning dynamics so that, when employed by all players in a general-sum multiplayer game, the \emph{swap regret} of each player after $T$ repetitions of the game is bounded by $O(\log T)$, improving over the prior best bounds of $O(\log^4 (T))$.

Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-Form Games

no code implementations1 Feb 2022 Gabriele Farina, Chung-Wei Lee, Haipeng Luo, Christian Kroer

In this paper we show that the Optimistic Multiplicative Weights Update (OMWU) algorithm -- the premier learning algorithm for NFGs -- can be simulated on the normal-form equivalent of an EFG in linear time per iteration in the game tree size using a kernel trick.

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

no code implementations NeurIPS 2021 Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

When a simulator is unavailable, we further consider a linear MDP setting and obtain $\widetilde{\mathcal{O}}({T}^{14/15})$ regret, which is the first result for linear MDPs with adversarial losses and bandit feedback.

Last-iterate Convergence in Extensive-Form Games

no code implementations NeurIPS 2021 Chung-Wei Lee, Christian Kroer, Haipeng Luo

Inspired by recent advances on last-iterate convergence of optimistic algorithms in zero-sum normal-form games, we study this phenomenon in sequential games, and provide a comprehensive study of last-iterate convergence for zero-sum extensive-form games with perfect recall (EFGs), using various optimistic regret-minimization algorithms over treeplexes.


Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

no code implementations8 Feb 2021 Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, Haipeng Luo

We study infinite-horizon discounted two-player zero-sum Markov games, and develop a decentralized algorithm that provably converges to the set of Nash equilibria under self-play.

Linear Last-iterate Convergence in Constrained Saddle-point Optimization

1 code implementation ICLR 2021 Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, Haipeng Luo

Specifically, for OMWU in bilinear games over the simplex, we show that when the equilibrium is unique, linear last-iterate convergence is achieved with a learning rate whose value is set to a universal constant, improving the result of (Daskalakis & Panageas, 2019b) under the same assumption.

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

no code implementations NeurIPS 2020 Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, Mengxiao Zhang

We develop a new approach to obtaining high probability regret bounds for online learning with bandit feedback against an adaptive adversary.

A Closer Look at Small-loss Bounds for Bandits with Graph Feedback

no code implementations2 Feb 2020 Chung-Wei Lee, Haipeng Luo, Mengxiao Zhang

We study small-loss bounds for adversarial multi-armed bandits with graph feedback, that is, adaptive regret bounds that depend on the loss of the best arm or related quantities, instead of the total number of rounds.

Multi-Armed Bandits

A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free

no code implementations3 Feb 2019 Yifang Chen, Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei

We propose the first contextual bandit algorithm that is parameter-free, efficient, and optimal in terms of dynamic regret.

Multi-Armed Bandits

Multi-Label Zero-Shot Learning with Structured Knowledge Graphs

1 code implementation CVPR 2018 Chung-Wei Lee, Wei Fang, Chih-Kuan Yeh, Yu-Chiang Frank Wang

In this paper, we propose a novel deep learning architecture for multi-label zero-shot learning (ML-ZSL), which is able to predict multiple unseen class labels for each input instance.

General Classification Knowledge Graphs +3

