Search Results for author: Weiqiang Zheng

Found 11 papers, 3 papers with code

COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences

1 code implementation30 Oct 2024 Yixin Liu, Argyris Oikonomou, Weiqiang Zheng, Yang Cai, Arman Cohan

To achieve robust alignment with general preferences, we model the alignment problem as a two-player zero-sum game, where the Nash equilibrium policy guarantees a 50% win rate against any competing policy.

Language Modelling

Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms

no code implementations15 Jun 2024 Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng

While both algorithms enjoy $O(1/T)$ ergodic convergence to Nash equilibrium in two-player zero-sum games, OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix and $\widetilde{O}(1/T)$ convergence to coarse correlated equilibria even in general-sum games.

On Tractable $Φ$-Equilibria in Non-Concave Games

no code implementations13 Mar 2024 Yang Cai, Constantinos Daskalakis, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng

While Online Gradient Descent and other no-regret learning procedures are known to efficiently converge to a coarse correlated equilibrium in games where each agent's utility is concave in their own strategy, this is not the case when utilities are non-concave -- a common scenario in machine learning applications involving strategies parameterized by deep neural networks, or when agents' utilities are computed by neural networks, or both.

Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games

no code implementations26 Jan 2024 Yang Cai, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng

In this paper, we improve both results significantly by providing an uncoupled policy optimization algorithm that attains a near-optimal $\tilde{O}(T^{-1})$ convergence rate for computing a correlated equilibrium.

Learning Thresholds with Latent Values and Censored Feedback

no code implementations7 Dec 2023 Jiahao Zhang, Tao Lin, Weiqiang Zheng, Zhe Feng, Yifeng Teng, Xiaotie Deng

In this paper, we investigate a problem of actively learning threshold in latent space, where the unknown reward $g(\gamma, v)$ depends on the proposed threshold $\gamma$ and latent value $v$ and it can be $only$ achieved if the threshold is lower than or equal to the unknown latent value.

Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games

no code implementations1 Nov 2023 Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng

Algorithms based on regret matching, specifically regret matching$^+$ (RM$^+$), and its variants are the most popular approaches for solving large-scale two-player zero-sum games in practice.

Doubly Optimal No-Regret Learning in Monotone Games

1 code implementation30 Jan 2023 Yang Cai, Weiqiang Zheng

We propose the accelerated optimistic gradient (AOG) algorithm, the first doubly optimal no-regret learning algorithm for smooth monotone games.

Accelerated Single-Call Methods for Constrained Min-Max Optimization

no code implementations6 Oct 2022 Yang Cai, Weiqiang Zheng

Finally, we show that the Reflected Gradient (RG) method, another single-call single-projection algorithm, has $O(\frac{1}{\sqrt{T}})$ last-iterate convergence rate for constrained convex-concave min-max optimization, answering an open problem of [Heish et al, 2019].

Accelerated Algorithms for Constrained Nonconvex-Nonconcave Min-Max Optimization and Comonotone Inclusion

no code implementations10 Jun 2022 Yang Cai, Argyris Oikonomou, Weiqiang Zheng

In our first contribution, we extend the Extra Anchored Gradient (EAG) algorithm, originally proposed by Yoon and Ryu (2021) for unconstrained min-max optimization, to constrained comonotone min-max optimization and comonotone inclusion, achieving an optimal convergence rate of $O\left(\frac{1}{T}\right)$ among all first-order methods.

Tight Last-Iterate Convergence of the Extragradient and the Optimistic Gradient Descent-Ascent Algorithm for Constrained Monotone Variational Inequalities

no code implementations20 Apr 2022 Yang Cai, Argyris Oikonomou, Weiqiang Zheng

We use the tangent residual (or a slight variation of the tangent residual) as the the potential function in our analysis of the extragradient algorithm (or the optimistic gradient descent-ascent algorithm) and prove that it is non-increasing between two consecutive iterates.

Nash Convergence of Mean-Based Learning Algorithms in First Price Auctions

1 code implementation8 Oct 2021 Xiaotie Deng, Xinyan Hu, Tao Lin, Weiqiang Zheng

Specifically, the results depend on the number of bidders with the highest value: - If the number is at least three, the bidding dynamics almost surely converges to a Nash equilibrium of the auction, both in time-average and in last-iterate.

Cannot find the paper you are looking for? You can Submit a new open access paper.