1 code implementation • 30 Oct 2024 • Yixin Liu, Argyris Oikonomou, Weiqiang Zheng, Yang Cai, Arman Cohan
To achieve robust alignment with general preferences, we model the alignment problem as a two-player zero-sum game, where the Nash equilibrium policy guarantees a 50% win rate against any competing policy.
no code implementations • 15 Jun 2024 • Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng
While both algorithms enjoy $O(1/T)$ ergodic convergence to Nash equilibrium in two-player zero-sum games, OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix and $\widetilde{O}(1/T)$ convergence to coarse correlated equilibria even in general-sum games.
no code implementations • 13 Mar 2024 • Yang Cai, Constantinos Daskalakis, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng
While Online Gradient Descent and other no-regret learning procedures are known to efficiently converge to a coarse correlated equilibrium in games where each agent's utility is concave in their own strategy, this is not the case when utilities are non-concave -- a common scenario in machine learning applications involving strategies parameterized by deep neural networks, or when agents' utilities are computed by neural networks, or both.
no code implementations • 26 Jan 2024 • Yang Cai, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng
In this paper, we improve both results significantly by providing an uncoupled policy optimization algorithm that attains a near-optimal $\tilde{O}(T^{-1})$ convergence rate for computing a correlated equilibrium.
no code implementations • 7 Dec 2023 • Jiahao Zhang, Tao Lin, Weiqiang Zheng, Zhe Feng, Yifeng Teng, Xiaotie Deng
In this paper, we investigate a problem of actively learning threshold in latent space, where the unknown reward $g(\gamma, v)$ depends on the proposed threshold $\gamma$ and latent value $v$ and it can be $only$ achieved if the threshold is lower than or equal to the unknown latent value.
no code implementations • 1 Nov 2023 • Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng
Algorithms based on regret matching, specifically regret matching$^+$ (RM$^+$), and its variants are the most popular approaches for solving large-scale two-player zero-sum games in practice.
1 code implementation • 30 Jan 2023 • Yang Cai, Weiqiang Zheng
We propose the accelerated optimistic gradient (AOG) algorithm, the first doubly optimal no-regret learning algorithm for smooth monotone games.
no code implementations • 6 Oct 2022 • Yang Cai, Weiqiang Zheng
Finally, we show that the Reflected Gradient (RG) method, another single-call single-projection algorithm, has $O(\frac{1}{\sqrt{T}})$ last-iterate convergence rate for constrained convex-concave min-max optimization, answering an open problem of [Heish et al, 2019].
no code implementations • 10 Jun 2022 • Yang Cai, Argyris Oikonomou, Weiqiang Zheng
In our first contribution, we extend the Extra Anchored Gradient (EAG) algorithm, originally proposed by Yoon and Ryu (2021) for unconstrained min-max optimization, to constrained comonotone min-max optimization and comonotone inclusion, achieving an optimal convergence rate of $O\left(\frac{1}{T}\right)$ among all first-order methods.
no code implementations • 20 Apr 2022 • Yang Cai, Argyris Oikonomou, Weiqiang Zheng
We use the tangent residual (or a slight variation of the tangent residual) as the the potential function in our analysis of the extragradient algorithm (or the optimistic gradient descent-ascent algorithm) and prove that it is non-increasing between two consecutive iterates.
1 code implementation • 8 Oct 2021 • Xiaotie Deng, Xinyan Hu, Tao Lin, Weiqiang Zheng
Specifically, the results depend on the number of bidders with the highest value: - If the number is at least three, the bidding dynamics almost surely converges to a Nash equilibrium of the auction, both in time-average and in last-iterate.