Search Results for author: Qiaomin Xie

Found 27 papers, 2 papers with code

Prelimit Coupling and Steady-State Convergence of Constant-stepsize Nonsmooth Contractive SA

no code implementations • 9 Apr 2024 • Yixuan Zhang, Dongyan Huo, Yudong Chen, Qiaomin Xie

Motivated by Q-learning, we study nonsmooth contractive stochastic approximation (SA) with constant stepsize.

Q-Learning

Paper
Add Code

Unichain and Aperiodicity are Sufficient for Asymptotic Optimality of Average-Reward Restless Bandits

no code implementations • 8 Feb 2024 • Yige Hong, Qiaomin Xie, Yudong Chen, Weina Wang

We consider the infinite-horizon, average-reward restless bandit problem in discrete time.

Paper
Add Code

Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation

no code implementations • 25 Jan 2024 • Yixuan Zhang, Qiaomin Xie

By connecting the constant stepsize Q-learning to a time-homogeneous Markov chain, we show the distributional convergence of the iterates in Wasserstein distance and establish its exponential convergence rate.

Q-Learning Reinforcement Learning (RL)

Paper
Add Code

Effectiveness of Constant Stepsize in Markovian LSA and Statistical Inference

no code implementations • 18 Dec 2023 • Dongyan Huo, Yudong Chen, Qiaomin Xie

Our procedure leverages the fast mixing property of constant-stepsize LSA for better covariance estimation and employs Richardson-Romberg (RR) extrapolation to reduce the bias induced by constant stepsize and Markovian data.

Paper
Add Code

Optimal Attack and Defense for Reinforcement Learning

no code implementations • 30 Nov 2023 • Jeremy McMahan, Young Wu, Xiaojin Zhu, Qiaomin Xie

Although the defense problem is NP-hard, we show that optimal Markovian defenses can be computed (learned) in polynomial time (sample complexity) in many scenarios.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value

no code implementations • 1 Nov 2023 • Young Wu, Jeremy McMahan, Yiding Chen, Yudong Chen, Xiaojin Zhu, Qiaomin Xie

We study the game modification problem, where a benevolent game designer or a malevolent adversary modifies the reward function of a zero-sum Markov game so that a target deterministic or stochastic policy profile becomes the unique Markov perfect Nash equilibrium and has a value within a target range, in a way that minimizes the modification cost.

Paper
Add Code

Reinforcement Learning for SBM Graphon Games with Re-Sampling

no code implementations • 25 Oct 2023 • Peihan Huo, Oscar Peralta, Junyu Guo, Qiaomin Xie, Andreea Minca

In more realistic scenarios where the block model is unknown, we propose a re-sampling scheme from a graphon integrated with the finite N-player MP-MFG model.

reinforcement-learning Stochastic Block Model

Paper
Add Code

VISER: A Tractable Solution Concept for Games with Information Asymmetry

1 code implementation • 18 Jul 2023 • Jeremy McMahan, Young Wu, Yudong Chen, Xiaojin Zhu, Qiaomin Xie

Many real-world games suffer from information asymmetry: one player is only aware of their own payoffs while the other player has the full game information.

Multi-agent Reinforcement Learning

Paper
Code

Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements

no code implementations • 28 Jun 2023 • Emmanouil-Vasileios Vlatakis-Gkaragkounis, Angeliki Giannou, Yudong Chen, Qiaomin Xie

Our work endeavors to elucidate and quantify the probabilistic structures intrinsic to these algorithms.

Paper
Add Code

Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes

no code implementations • 28 Jun 2023 • Zihan Zhang, Qiaomin Xie

In the online setting, we propose model-free RL algorithms based on reference-advantage decomposition.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

On Faking a Nash Equilibrium

no code implementations • 13 Jun 2023 • Young Wu, Jeremy McMahan, Xiaojin Zhu, Qiaomin Xie

We characterize offline data poisoning attacks on Multi-Agent Reinforcement Learning (MARL), where an attacker may change a data set in an attempt to install a (potentially fictitious) unique Markov-perfect Nash equilibrium.

Data Poisoning Multi-agent Reinforcement Learning +1

Paper
Add Code

Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces

no code implementations • 2 Jun 2023 • Brahma S. Pavse, Matthew Zurek, Yudong Chen, Qiaomin Xie, Josiah P. Hanna

This latter objective is called stability and is especially important when the state space is unbounded, such that the states can be arbitrarily far from each other and the agent can drift far away from the desired states.

Attribute reinforcement-learning +1

Paper
Add Code

Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption

1 code implementation • NeurIPS 2023 • Yige Hong, Qiaomin Xie, Yudong Chen, Weina Wang

In both settings, our work is the first asymptotic optimality result that does not require UGAP.

Paper
Code

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

no code implementations • 29 Jan 2023 • Subhojyoti Mukherjee, Qiaomin Xie, Josiah Hanna, Robert Nowak

In this paper, we study the problem of optimal data collection for policy evaluation in linear bandits.

Experimental Design

Paper
Add Code

Bias and Extrapolation in Markovian Linear Stochastic Approximation with Constant Stepsizes

no code implementations • 3 Oct 2022 • Dongyan Huo, Yudong Chen, Qiaomin Xie

We consider Linear Stochastic Approximation (LSA) with a constant stepsize and Markovian data.

Paper
Add Code

Reward Poisoning Attacks on Offline Multi-Agent Reinforcement Learning

no code implementations • 4 Jun 2022 • Young Wu, Jeremy McMahan, Xiaojin Zhu, Qiaomin Xie

In offline multi-agent reinforcement learning (MARL), agents estimate policies from a given dataset.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

RL-QN: A Reinforcement Learning Framework for Optimal Control of Queueing Systems

no code implementations • 14 Nov 2020 • Bai Liu, Qiaomin Xie, Eytan Modiano

In this work, we consider using model-based reinforcement learning (RL) to learn the optimal control policy for queueing networks so that the average job delay (or equivalently the average queue backlog) is minimized.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Provable Fictitious Play for General Mean-Field Games

no code implementations • 8 Oct 2020 • Qiaomin Xie, Zhuoran Yang, Zhaoran Wang, Andreea Minca

We propose a reinforcement learning algorithm for stationary mean-field games, where the goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Dynamic Regret of Policy Optimization in Non-stationary Environments

no code implementations • NeurIPS 2020 • Yingjie Fei, Zhuoran Yang, Zhaoran Wang, Qiaomin Xie

We consider reinforcement learning (RL) in episodic MDPs with adversarial full-information reward feedback and unknown fixed transition kernels.

Reinforcement Learning (RL)

Paper
Add Code

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

no code implementations • NeurIPS 2020 • Yingjie Fei, Zhuoran Yang, Yudong Chen, Zhaoran Wang, Qiaomin Xie

We study risk-sensitive reinforcement learning in episodic Markov decision processes with unknown transition kernels, where the goal is to optimize the total reward under the risk measure of exponential utility.

Q-Learning reinforcement-learning +1

Paper
Add Code

POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis

no code implementations • NeurIPS 2020 • Weichao Mao, Kaiqing Zhang, Qiaomin Xie, Tamer Başar

Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), has demonstrated remarkable performance in applications with finite spaces.

Paper
Add Code

Stable Reinforcement Learning with Unbounded State Space

no code implementations • L4DC 2020 • Devavrat Shah, Qiaomin Xie, Zhi Xu

As a proof of concept, we propose an RL policy using Sparse-Sampling-based Monte Carlo Oracle and argue that it satisfies the stability property as long as the system dynamics under the optimal policy respects a Lyapunov function.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

On Reinforcement Learning for Turn-based Zero-sum Markov Games

no code implementations • 25 Feb 2020 • Devavrat Shah, Varun Somani, Qiaomin Xie, Zhi Xu

For a concrete instance of EIS where random policy is used for "exploration", Monte-Carlo Tree Search is used for "policy improvement" and Nearest Neighbors is used for "supervised learning", we establish that this method finds an $\varepsilon$-approximate value function of Nash equilibrium in $\widetilde{O}(\varepsilon^{-(d+4)})$ steps when the underlying state-space of the game is continuous and $d$-dimensional.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

no code implementations • 17 Feb 2020 • Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang

In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.

Paper
Add Code

Understanding & Generalizing AlphaGo Zero

no code implementations • ICLR 2019 • Ravichandra Addanki, Mohammad Alizadeh, Shaileshh Bojja Venkatakrishnan, Devavrat Shah, Qiaomin Xie, Zhi Xu

AlphaGo Zero (AGZ) introduced a new {\em tabula rasa} reinforcement learning algorithm that has achieved superhuman performance in the games of Go, Chess, and Shogi with no prior knowledge other than the rules of the game.

Decision Making reinforcement-learning +2

Paper
Add Code

Non-Asymptotic Analysis of Monte Carlo Tree Search

no code implementations • 14 Feb 2019 • Devavrat Shah, Qiaomin Xie, Zhi Xu

In effect, we establish that to learn an $\varepsilon$ approximation of the value function with respect to $\ell_\infty$ norm, MCTS combined with nearest neighbor requires a sample size scaling as $\widetilde{O}\big(\varepsilon^{-(d+4)}\big)$, where $d$ is the dimension of the state space.

Paper
Add Code

Q-learning with Nearest Neighbors

no code implementations • NeurIPS 2018 • Devavrat Shah, Qiaomin Xie

In particular, for MDPs with a $d$-dimensional state space and the discounted factor $\gamma \in (0, 1)$, given an arbitrary sample path with "covering time" $ L $, we establish that the algorithm is guaranteed to output an $\varepsilon$-accurate estimate of the optimal Q-function using $\tilde{O}\big(L/(\varepsilon^3(1-\gamma)^7)\big)$ samples.

Q-Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.