no code implementations • 24 Jun 2024 • Jeremy McMahan, Young Wu, Yudong Chen, Xiaojin Zhu, Qiaomin Xie
We study security threats to Markov games due to information asymmetry and misinformation.
no code implementations • 13 Jun 2024 • Jeremy McMahan, Giovanni Artiglio, Qiaomin Xie
We study robust Markov games (RMG) with $s$-rectangular uncertainty.
no code implementations • 7 Jun 2024 • Subhojyoti Mukherjee, Josiah P. Hanna, Qiaomin Xie, Robert Nowak
Interestingly, we show that our algorithm, without the knowledge of the underlying problem structure, can learn a near-optimal policy in-context by leveraging the shared structure across diverse tasks.
no code implementations • 28 May 2024 • Yige Hong, Qiaomin Xie, Yudong Chen, Weina Wang
We show that our policy is asymptotically optimal with an $O(\exp(-C N))$ optimality gap for an $N$-armed problem, under the mild assumptions of aperiodic-unichain, non-degeneracy, and local stability.
no code implementations • 27 May 2024 • Dongyan Huo, Yixuan Zhang, Yudong Chen, Qiaomin Xie
By leveraging the smoothness and recurrence properties of the SA updates, we develop a fine-grained analysis of the correlation between the SA iterates $\theta_k$ and Markovian data $x_k$.
no code implementations • 9 Apr 2024 • Yixuan Zhang, Dongyan Huo, Yudong Chen, Qiaomin Xie
Motivated by Q-learning, we study nonsmooth contractive stochastic approximation (SA) with constant stepsize.
no code implementations • 8 Feb 2024 • Yige Hong, Qiaomin Xie, Yudong Chen, Weina Wang
We consider the infinite-horizon, average-reward restless bandit problem in discrete time.
no code implementations • 25 Jan 2024 • Yixuan Zhang, Qiaomin Xie
By connecting the constant stepsize Q-learning to a time-homogeneous Markov chain, we show the distributional convergence of the iterates in Wasserstein distance and establish its exponential convergence rate.
no code implementations • 18 Dec 2023 • Dongyan Huo, Yudong Chen, Qiaomin Xie
Our procedure leverages the fast mixing property of constant-stepsize LSA for better covariance estimation and employs Richardson-Romberg (RR) extrapolation to reduce the bias induced by constant stepsize and Markovian data.
1 code implementation • 30 Nov 2023 • Jeremy McMahan, Young Wu, Xiaojin Zhu, Qiaomin Xie
Although the defense problem is NP-hard, we show that optimal Markovian defenses can be computed (learned) in polynomial time (sample complexity) in many scenarios.
1 code implementation • 1 Nov 2023 • Young Wu, Jeremy McMahan, Yiding Chen, Yudong Chen, Xiaojin Zhu, Qiaomin Xie
We study the game modification problem, where a benevolent game designer or a malevolent adversary modifies the reward function of a zero-sum Markov game so that a target deterministic or stochastic policy profile becomes the unique Markov perfect Nash equilibrium and has a value within a target range, in a way that minimizes the modification cost.
no code implementations • 25 Oct 2023 • Peihan Huo, Oscar Peralta, Junyu Guo, Qiaomin Xie, Andreea Minca
In more realistic scenarios where the block model is unknown, we propose a re-sampling scheme from a graphon integrated with the finite N-player MP-MFG model.
1 code implementation • 18 Jul 2023 • Jeremy McMahan, Young Wu, Yudong Chen, Xiaojin Zhu, Qiaomin Xie
Many real-world games suffer from information asymmetry: one player is only aware of their own payoffs while the other player has the full game information.
no code implementations • 28 Jun 2023 • Emmanouil-Vasileios Vlatakis-Gkaragkounis, Angeliki Giannou, Yudong Chen, Qiaomin Xie
Our work endeavors to elucidate and quantify the probabilistic structures intrinsic to these algorithms.
no code implementations • 28 Jun 2023 • Zihan Zhang, Qiaomin Xie
In the online setting, we propose model-free RL algorithms based on reference-advantage decomposition.
no code implementations • 13 Jun 2023 • Young Wu, Jeremy McMahan, Xiaojin Zhu, Qiaomin Xie
We characterize offline data poisoning attacks on Multi-Agent Reinforcement Learning (MARL), where an attacker may change a data set in an attempt to install a (potentially fictitious) unique Markov-perfect Nash equilibrium for a two-player zero-sum Markov game.
1 code implementation • 2 Jun 2023 • Brahma S. Pavse, Matthew Zurek, Yudong Chen, Qiaomin Xie, Josiah P. Hanna
This latter objective is called stability and is especially important when the state space is unbounded, such that the states can be arbitrarily far from each other and the agent can drift far away from the desired states.
1 code implementation • NeurIPS 2023 • Yige Hong, Qiaomin Xie, Yudong Chen, Weina Wang
In both settings, our work is the first asymptotic optimality result that does not require UGAP.
no code implementations • 29 Jan 2023 • Subhojyoti Mukherjee, Qiaomin Xie, Josiah Hanna, Robert Nowak
In this paper, we study the problem of optimal data collection for policy evaluation in linear bandits.
no code implementations • 3 Oct 2022 • Dongyan Huo, Yudong Chen, Qiaomin Xie
We consider Linear Stochastic Approximation (LSA) with a constant stepsize and Markovian data.
no code implementations • 4 Jun 2022 • Young Wu, Jeremy McMahan, Xiaojin Zhu, Qiaomin Xie
In offline multi-agent reinforcement learning (MARL), agents estimate policies from a given dataset.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 14 Nov 2020 • Bai Liu, Qiaomin Xie, Eytan Modiano
In this work, we consider using model-based reinforcement learning (RL) to learn the optimal control policy for queueing networks so that the average job delay (or equivalently the average queue backlog) is minimized.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 8 Oct 2020 • Qiaomin Xie, Zhuoran Yang, Zhaoran Wang, Andreea Minca
We propose a reinforcement learning algorithm for stationary mean-field games, where the goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium.
no code implementations • NeurIPS 2020 • Yingjie Fei, Zhuoran Yang, Zhaoran Wang, Qiaomin Xie
We consider reinforcement learning (RL) in episodic MDPs with adversarial full-information reward feedback and unknown fixed transition kernels.
no code implementations • NeurIPS 2020 • Yingjie Fei, Zhuoran Yang, Yudong Chen, Zhaoran Wang, Qiaomin Xie
We study risk-sensitive reinforcement learning in episodic Markov decision processes with unknown transition kernels, where the goal is to optimize the total reward under the risk measure of exponential utility.
no code implementations • NeurIPS 2020 • Weichao Mao, Kaiqing Zhang, Qiaomin Xie, Tamer Başar
Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), has demonstrated remarkable performance in applications with finite spaces.
no code implementations • L4DC 2020 • Devavrat Shah, Qiaomin Xie, Zhi Xu
As a proof of concept, we propose an RL policy using Sparse-Sampling-based Monte Carlo Oracle and argue that it satisfies the stability property as long as the system dynamics under the optimal policy respects a Lyapunov function.
no code implementations • 25 Feb 2020 • Devavrat Shah, Varun Somani, Qiaomin Xie, Zhi Xu
For a concrete instance of EIS where random policy is used for "exploration", Monte-Carlo Tree Search is used for "policy improvement" and Nearest Neighbors is used for "supervised learning", we establish that this method finds an $\varepsilon$-approximate value function of Nash equilibrium in $\widetilde{O}(\varepsilon^{-(d+4)})$ steps when the underlying state-space of the game is continuous and $d$-dimensional.
no code implementations • 17 Feb 2020 • Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
no code implementations • ICLR 2019 • Ravichandra Addanki, Mohammad Alizadeh, Shaileshh Bojja Venkatakrishnan, Devavrat Shah, Qiaomin Xie, Zhi Xu
AlphaGo Zero (AGZ) introduced a new {\em tabula rasa} reinforcement learning algorithm that has achieved superhuman performance in the games of Go, Chess, and Shogi with no prior knowledge other than the rules of the game.
no code implementations • 14 Feb 2019 • Devavrat Shah, Qiaomin Xie, Zhi Xu
In effect, we establish that to learn an $\varepsilon$ approximation of the value function with respect to $\ell_\infty$ norm, MCTS combined with nearest neighbor requires a sample size scaling as $\widetilde{O}\big(\varepsilon^{-(d+4)}\big)$, where $d$ is the dimension of the state space.
no code implementations • NeurIPS 2018 • Devavrat Shah, Qiaomin Xie
In particular, for MDPs with a $d$-dimensional state space and the discounted factor $\gamma \in (0, 1)$, given an arbitrary sample path with "covering time" $ L $, we establish that the algorithm is guaranteed to output an $\varepsilon$-accurate estimate of the optimal Q-function using $\tilde{O}\big(L/(\varepsilon^3(1-\gamma)^7)\big)$ samples.