no code implementations • 4 Mar 2025 • Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng
Non-ergodic convergence of learning dynamics in games is widely studied recently because of its importance in both theory and practice.
no code implementations • 29 Dec 2024 • Yang Cai, Siddharth Mitra, Xiuyuan Wang, Andre Wibisono
We prove an exponential convergence guarantee for the mean-field min-max Langevin dynamics to compute the equilibrium distribution of the zero-sum game.
no code implementations • 1 Dec 2024 • Yang Cai, Xiangyu Liu, Argyris Oikonomou, Kaiqing Zhang
Partial observability of the underlying states generally presents significant challenges for reinforcement learning (RL).
Partially Observable Reinforcement Learning
reinforcement-learning
+2
1 code implementation • 30 Oct 2024 • Yixin Liu, Argyris Oikonomou, Weiqiang Zheng, Yang Cai, Arman Cohan
To achieve robust alignment with general preferences, we model the alignment problem as a two-player zero-sum game, where the Nash equilibrium policy guarantees a 50% win rate against any competing policy.
no code implementations • 15 Jun 2024 • Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng
While both algorithms enjoy $O(1/T)$ ergodic convergence to Nash equilibrium in two-player zero-sum games, OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix and $\widetilde{O}(1/T)$ convergence to coarse correlated equilibria even in general-sum games.
no code implementations • 13 Mar 2024 • Yang Cai, Constantinos Daskalakis, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng
While Online Gradient Descent and other no-regret learning procedures are known to efficiently converge to a coarse correlated equilibrium in games where each agent's utility is concave in their own strategy, this is not the case when utilities are non-concave -- a common scenario in machine learning applications involving strategies parameterized by deep neural networks, or when agents' utilities are computed by neural networks, or both.
no code implementations • 7 Feb 2024 • Xi Chen, Yang Cai, Yuan Wu, Bo Xiong, Taesung Park
Recently, MBConv blocks, initially designed for efficiency in resource-limited settings and later adapted for cutting-edge image classification performances, have demonstrated significant potential in image classification tasks.
no code implementations • 26 Jan 2024 • Yang Cai, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng
In this paper, we improve both results significantly by providing an uncoupled policy optimization algorithm that attains a near-optimal $\tilde{O}(T^{-1})$ convergence rate for computing a correlated equilibrium.
no code implementations • 1 Nov 2023 • Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng
Despite their widespread use for solving real games, virtually nothing is known about their last-iterate convergence.
no code implementations • 29 Jun 2023 • Yang Cai, Michael I. Jordan, Tianyi Lin, Argyris Oikonomou, Emmanouil-Vasileios Vlatakis-Gkaragkounis
Numerous applications in machine learning and data analytics can be formulated as equilibrium computation over Riemannian manifolds.
no code implementations • 16 Feb 2023 • Yang Cai, Zhe Feng, Christopher Liaw, Aranyak Mehta, Grigoris Velegkas
We characterize the optimal mechanism for this MDP as a Myerson's auction with a notion of modified virtual value, which relies on the value distribution of the advertiser, the current user state, and the future impact of showing the ad to the user.
1 code implementation • 30 Jan 2023 • Yang Cai, Weiqiang Zheng
We propose the accelerated optimistic gradient (AOG) algorithm, the first doubly optimal no-regret learning algorithm for smooth monotone games.
no code implementations • 6 Oct 2022 • Yang Cai, Weiqiang Zheng
Finally, we show that the Reflected Gradient (RG) method, another single-call single-projection algorithm, has $O(\frac{1}{\sqrt{T}})$ last-iterate convergence rate for constrained convex-concave min-max optimization, answering an open problem of [Heish et al, 2019].
no code implementations • 10 Jun 2022 • Yang Cai, Argyris Oikonomou, Weiqiang Zheng
In our first contribution, we extend the Extra Anchored Gradient (EAG) algorithm, originally proposed by Yoon and Ryu (2021) for unconstrained min-max optimization, to constrained comonotone min-max optimization and comonotone inclusion, achieving an optimal convergence rate of $O\left(\frac{1}{T}\right)$ among all first-order methods.
no code implementations • 20 Apr 2022 • Yang Cai, Argyris Oikonomou, Weiqiang Zheng
We use the tangent residual (or a slight variation of the tangent residual) as the the potential function in our analysis of the extragradient algorithm (or the optimistic gradient descent-ascent algorithm) and prove that it is non-increasing between two consecutive iterates.
no code implementations • 25 Oct 2021 • Yang Cai, Constantinos Daskalakis
We propose a mechanism design framework for this setting, building on a recent robustification framework by Brustle et al., which disentangles the statistical challenge of estimating a multi-dimensional prior from the task of designing a good mechanism for it, and robustifies the performance of the latter against the estimation error of the former.
no code implementations • 6 Nov 2019 • Johaness Brustle, Yang Cai, Constantinos Daskalakis
When item values are sampled from more general graphical models, we combine our robustness theorem with novel sample complexity results for learning Markov Random Fields or Bayesian Networks in Prokhorov distance, which may be of independent interest.
no code implementations • NeurIPS 2018 • Jessie Huang, Fa Wu, Doina Precup, Yang Cai
We propose a framework for ensuring safe behavior of a reinforcement learning agent when the reward function may be difficult to specify.
no code implementations • 1 Sep 2017 • Yang Cai, Constantinos Daskalakis
The second is a more general max-min learning setting that we introduce, where we are given "approximate distributions," and we seek to compute an auction whose revenue is approximately optimal simultaneously for all "true distributions" that are close to the given ones.
no code implementations • 11 Aug 2014 • Yang Cai, Constantinos Daskalakis, Christos H. Papadimitriou
We propose an optimum mechanism for providing monetary incentives to the data sources of a statistical estimator such as linear regression, so that high quality data is provided at low cost, in the sense that the sum of payments and estimation error is minimized.