no code implementations • 2 Sep 2024 • Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, Adam Wierman
In this paper, we consider two-player zero-sum matrix and stochastic games and develop learning dynamics that are payoff-based, convergent, rational, and symmetric between the two players.
no code implementations • 29 Jul 2024 • Fathima Zarin Faizal, Asuman Ozdaglar, Martin J. Wainwright
We consider two settings that are distinguished by the type of information that each player has about the game and their opponent's strategy.
1 code implementation • 29 Jul 2024 • Mingyang Liu, Gabriele Farina, Asuman Ozdaglar
LiteEFG is an efficient library with easy-to-use Python bindings, which can solve multiplayer extensive-form games (EFGs).
no code implementations • 20 May 2024 • Kihyun Kim, Jiawei Zhang, Asuman Ozdaglar, Pablo A. Parrilo
Inverse Reinforcement Learning (IRL) and Reinforcement Learning from Human Feedback (RLHF) are pivotal methodologies in reward learning, which involve inferring and shaping the underlying reward function of sequential decision-making problems based on observed human demonstrations and feedback.
1 code implementation • 3 May 2024 • Jiancong Xiao, Jiawei Zhang, Zhi-Quan Luo, Asuman Ozdaglar
To this aim, we introduce Moreau envelope-$\mathcal{A}$, a variant of the Moreau Envelope-type algorithm.
no code implementations • 30 Apr 2024 • Chanwoo Park, Mingyang Liu, Dingwen Kong, Kaiqing Zhang, Asuman Ozdaglar
We propose two approaches based on reward and preference aggregation, respectively: the former utilizes both utilitarianism and Leximin approaches to aggregate individual reward models, with sample complexity guarantees; the latter directly aggregates the human feedback in the form of probabilistic opinions.
no code implementations • 25 Mar 2024 • Chanwoo Park, Xiangyu Liu, Asuman Ozdaglar, Kaiqing Zhang
To better understand the limits of LLM agents in these interactive environments, we propose to study their interactions in benchmark decision-making settings in online learning and game theory, through the performance metric of \emph{regret}.
no code implementations • 30 Dec 2023 • Daniel Huttenlocher, Hannah Li, Liang Lyu, Asuman Ozdaglar, James Siderius
Most of the existing literature on platform recommendation algorithms largely focuses on user preferences and decisions, and does not simultaneously address creator incentives.
no code implementations • 8 Dec 2023 • Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, Adam Wierman
Specifically, through a change of variable, we show that the update equation of the slow-timescale iterates resembles the classical smoothed best-response dynamics, where the regularized Nash gap serves as a valid Lyapunov function.
no code implementations • 22 Aug 2023 • Amirhossein Reisizadeh, Khashayar Gatmiry, Asuman Ozdaglar
In many settings however, heterogeneous data may be generated in clusters with shared structures, as is the case in several applications such as federated learning where a common latent variable governs the distribution of all the samples generated by a client.
no code implementations • NeurIPS 2023 • Chanwoo Park, Kaiqing Zhang, Asuman Ozdaglar
We study a new class of Markov games, \emph(multi-player) zero-sum Markov Games} with \emph{Networked separable interactions} (zero-sum NMGs), to model the local interaction structure in non-cooperative multi-agent sequential decision-making.
no code implementations • 28 Dec 2022 • Asuman Ozdaglar, Sarath Pattathil, Jiawei Zhang, Kaiqing Zhang
In this work, we revisit the linear programming (LP) reformulation of Markov decision processes for offline RL, with the goal of developing algorithms with optimal $O(1/\sqrt{n})$ sample complexity, where $n$ is the sample size, under partial data coverage and general function approximation, and with favorable computational tractability.
no code implementations • 23 Oct 2022 • Sarath Pattathil, Kaiqing Zhang, Asuman Ozdaglar
We also generalize the results to certain function approximation settings.
no code implementations • 19 Jun 2022 • Mingyang Liu, Asuman Ozdaglar, Tiancheng Yu, Kaiqing Zhang
Second, we show that regularized counterfactual regret minimization (\texttt{Reg-CFR}), with a variant of optimistic mirror descent algorithm as regret-minimizer, can achieve $O(1/T^{1/4})$ best-iterate, and $O(1/T^{3/4})$ average-iterate convergence rate for finding NE in EFGs.
no code implementations • 9 Jun 2022 • Asuman Ozdaglar, Sarath Pattathil, Jiawei Zhang, Kaiqing Zhang
Minimax optimization has served as the backbone of many machine learning (ML) problems.
no code implementations • 10 Jan 2022 • Alireza Fallah, Ali Makhdoumi, Azarakhsh Malekian, Asuman Ozdaglar
We consider a platform's problem of collecting data from privacy sensitive users to estimate an underlying parameter of interest.
no code implementations • 23 Nov 2021 • Asuman Ozdaglar, Muhammed O. Sayin, Kaiqing Zhang
We focus on the development of simple and independent learning dynamics for stochastic games: each agent is myopic and chooses best-response type actions to other agents' strategy without any coordination with her opponent.
1 code implementation • 14 Jun 2021 • Theo Diamandis, Yonina C. Eldar, Alireza Fallah, Farzan Farnia, Asuman Ozdaglar
We propose an optimal transport-based framework for MLR problems, Wasserstein Mixed Linear Regression (WMLR), which minimizes the Wasserstein distance between the learned and target mixture regression models.
no code implementations • NeurIPS 2021 • Muhammed O. Sayin, Kaiqing Zhang, David S. Leslie, Tamer Basar, Asuman Ozdaglar
The key challenge in this decentralized setting is the non-stationarity of the environment from an agent's perspective, since both her own payoffs and the system evolution depend on the actions of other agents, and each agent adapts her policies simultaneously and independently.
no code implementations • NeurIPS 2021 • Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar
In this paper, we study the generalization properties of Model-Agnostic Meta-Learning (MAML) algorithms for supervised learning problems.
2 code implementations • NeurIPS 2020 • Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar
In this paper, we study a personalized variant of the federated learning in which our goal is to find an initial shared model that current or new users can easily adapt to their local dataset by performing one or a few steps of gradient descent with respect to their own data.
no code implementations • 23 Oct 2020 • Farzan Farnia, Asuman Ozdaglar
In this paper, we show that the optimization algorithm also plays a key role in the generalization performance of the trained minimax model.
no code implementations • 18 Oct 2020 • Manxi Wu, Saurabh Amin, Asuman Ozdaglar
Any fixed point belief consistently estimates the payoff distribution given the fixed point strategy profile.
no code implementations • 8 Oct 2020 • Muhammed O. Sayin, Francesca Parise, Asuman Ozdaglar
We present a novel variant of fictitious play dynamics combining classical fictitious play with Q-learning for stochastic games and analyze its convergence properties in two-player zero-sum stochastic games.
no code implementations • L4DC 2020 • Manxi Wu, Saurabh Amin, Asuman Ozdaglar
We study a Bayesian learning dynamics induced by agents who repeatedly allocate loads on a set of resources based on their belief of an unknown parameter that affects the cost distributions of resources.
no code implementations • ICML 2020 • Farzan Farnia, Asuman Ozdaglar
We discuss several numerical experiments demonstrating the existence of proximal equilibrium solutions in GAN minimax problems.
no code implementations • 19 Feb 2020 • Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar
In this paper, we study a personalized variant of the federated learning in which our goal is to find an initial shared model that current or new users can easily adapt to their local dataset by performing one or a few steps of gradient descent with respect to their own data.
1 code implementation • 13 Feb 2020 • Alireza Fallah, Asuman Ozdaglar, Sarath Pattathil
Next, we propose a multistage variant of stochastic GDA (M-GDA) that runs in multiple stages with a particular learning rate decay schedule and converges to the exact solution of the minimax problem.
1 code implementation • NeurIPS 2021 • Alireza Fallah, Kristian Georgiev, Aryan Mokhtari, Asuman Ozdaglar
We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcement Learning (RL) problems, where the goal is to find a policy using data from several tasks represented by Markov Decision Processes (MDPs) that can be updated by one step of stochastic policy gradient for the realized MDP.
no code implementations • 31 Jan 2020 • Noah Golowich, Sarath Pattathil, Constantinos Daskalakis, Asuman Ozdaglar
In this paper we study the smooth convex-concave saddle point problem.
no code implementations • 31 Oct 2019 • Weijie Liu, Aryan Mokhtari, Asuman Ozdaglar, Sarath Pattathil, Zebang Shen, Nenggan Zheng
In this paper, we focus on solving a class of constrained non-convex non-concave saddle point problems in a decentralized manner by a group of nodes in a network.
no code implementations • 19 Oct 2019 • Alireza Fallah, Mert Gurbuzbalaban, Asuman Ozdaglar, Umut Simsekli, Lingjiong Zhu
When gradients do not contain noise, we also prove that distributed accelerated methods can \emph{achieve acceleration}, requiring $\mathcal{O}(\kappa \log(1/\varepsilon))$ gradient evaluations and $\mathcal{O}(\kappa \log(1/\varepsilon))$ communications to converge to the same fixed point with the non-accelerated variant where $\kappa$ is the condition number and $\varepsilon$ is the target accuracy.
no code implementations • 27 Aug 2019 • Alireza Fallah, Aryan Mokhtari, Asuman Ozdaglar
We study the convergence of a class of gradient-based Model-Agnostic Meta-Learning (MAML) methods and characterize their overall complexity as well as their best achievable accuracy in terms of gradient norm for nonconvex loss functions.
no code implementations • 3 Jun 2019 • Aryan Mokhtari, Asuman Ozdaglar, Sarath Pattathil
To do so, we first show that both OGDA and EG can be interpreted as approximate variants of the proximal point method.
no code implementations • 24 Jan 2019 • Aryan Mokhtari, Asuman Ozdaglar, Sarath Pattathil
In this paper we consider solving saddle point problems using two variants of Gradient Descent-Ascent algorithms, Extra-gradient (EG) and Optimistic Gradient Descent Ascent (OGDA) methods.
no code implementations • NeurIPS 2019 • Necdet Serhat Aybat, Alireza Fallah, Mert Gurbuzbalaban, Asuman Ozdaglar
We study the problem of minimizing a strongly convex, smooth function when we have noisy estimates of its gradient.
no code implementations • NeurIPS 2018 • Aryan Mokhtari, Asuman Ozdaglar, Ali Jadbabaie
We propose a generic framework that yields convergence to a second-order stationary point of the problem, if the convex set $\mathcal{C}$ is simple for a quadratic objective function.
no code implementations • 12 Jul 2018 • Murat A. Erdogdu, Asuman Ozdaglar, Pablo A. Parrilo, Nuri Denizcan Vanli
Furthermore, incorporating Lanczos method to the block-coordinate maximization, we propose an algorithm that is guaranteed to return a solution that provides $1-O(1/r)$ approximation to the original SDP without any assumptions, where $r$ is the rank of the factorization.
no code implementations • 27 May 2018 • Necdet Serhat Aybat, Alireza Fallah, Mert Gurbuzbalaban, Asuman Ozdaglar
We study the trade-offs between convergence rate and robustness to gradient errors in designing a first-order algorithm.
no code implementations • NeurIPS 2017 • Mert Gurbuzbalaban, Asuman Ozdaglar, Pablo A. Parrilo, Nuri Vanli
The coordinate descent (CD) method is a classical optimization algorithm that has seen a revival of interest because of its competitive performance in machine learning applications.
no code implementations • NeurIPS 2013 • Christina E. Lee, Asuman Ozdaglar, Devavrat Shah
In this paper, we provide a novel algorithm that answers whether a chosen state in a MC has stationary probability larger than some $\Delta \in (0, 1)$.
no code implementations • 8 Mar 2008 • Angelia Nedić, Alex Olshevsky, Asuman Ozdaglar, John N. Tsitsiklis
We consider a convex unconstrained optimization problem that arises in a network of agents whose goal is to cooperatively optimize the sum of the individual agent objective functions through local computations and communications.
Optimization and Control