Search Results for author: Qiaomin Xie

Found 27 papers, 2 papers with code

Prelimit Coupling and Steady-State Convergence of Constant-stepsize Nonsmooth Contractive SA

no code implementations9 Apr 2024 Yixuan Zhang, Dongyan Huo, Yudong Chen, Qiaomin Xie

Motivated by Q-learning, we study nonsmooth contractive stochastic approximation (SA) with constant stepsize.

Q-Learning

Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation

no code implementations25 Jan 2024 Yixuan Zhang, Qiaomin Xie

By connecting the constant stepsize Q-learning to a time-homogeneous Markov chain, we show the distributional convergence of the iterates in Wasserstein distance and establish its exponential convergence rate.

Q-Learning Reinforcement Learning (RL)

Effectiveness of Constant Stepsize in Markovian LSA and Statistical Inference

no code implementations18 Dec 2023 Dongyan Huo, Yudong Chen, Qiaomin Xie

Our procedure leverages the fast mixing property of constant-stepsize LSA for better covariance estimation and employs Richardson-Romberg (RR) extrapolation to reduce the bias induced by constant stepsize and Markovian data.

Optimal Attack and Defense for Reinforcement Learning

no code implementations30 Nov 2023 Jeremy McMahan, Young Wu, Xiaojin Zhu, Qiaomin Xie

Although the defense problem is NP-hard, we show that optimal Markovian defenses can be computed (learned) in polynomial time (sample complexity) in many scenarios.

reinforcement-learning Reinforcement Learning (RL)

Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value

no code implementations1 Nov 2023 Young Wu, Jeremy McMahan, Yiding Chen, Yudong Chen, Xiaojin Zhu, Qiaomin Xie

We study the game modification problem, where a benevolent game designer or a malevolent adversary modifies the reward function of a zero-sum Markov game so that a target deterministic or stochastic policy profile becomes the unique Markov perfect Nash equilibrium and has a value within a target range, in a way that minimizes the modification cost.

Reinforcement Learning for SBM Graphon Games with Re-Sampling

no code implementations25 Oct 2023 Peihan Huo, Oscar Peralta, Junyu Guo, Qiaomin Xie, Andreea Minca

In more realistic scenarios where the block model is unknown, we propose a re-sampling scheme from a graphon integrated with the finite N-player MP-MFG model.

reinforcement-learning Stochastic Block Model

VISER: A Tractable Solution Concept for Games with Information Asymmetry

1 code implementation18 Jul 2023 Jeremy McMahan, Young Wu, Yudong Chen, Xiaojin Zhu, Qiaomin Xie

Many real-world games suffer from information asymmetry: one player is only aware of their own payoffs while the other player has the full game information.

Multi-agent Reinforcement Learning

Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements

no code implementations28 Jun 2023 Emmanouil-Vasileios Vlatakis-Gkaragkounis, Angeliki Giannou, Yudong Chen, Qiaomin Xie

Our work endeavors to elucidate and quantify the probabilistic structures intrinsic to these algorithms.

On Faking a Nash Equilibrium

no code implementations13 Jun 2023 Young Wu, Jeremy McMahan, Xiaojin Zhu, Qiaomin Xie

We characterize offline data poisoning attacks on Multi-Agent Reinforcement Learning (MARL), where an attacker may change a data set in an attempt to install a (potentially fictitious) unique Markov-perfect Nash equilibrium.

Data Poisoning Multi-agent Reinforcement Learning +1

Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces

no code implementations2 Jun 2023 Brahma S. Pavse, Matthew Zurek, Yudong Chen, Qiaomin Xie, Josiah P. Hanna

This latter objective is called stability and is especially important when the state space is unbounded, such that the states can be arbitrarily far from each other and the agent can drift far away from the desired states.

Attribute reinforcement-learning +1

SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits

no code implementations29 Jan 2023 Subhojyoti Mukherjee, Qiaomin Xie, Josiah Hanna, Robert Nowak

In this paper, we study the problem of optimal data collection for policy evaluation in linear bandits.

Experimental Design

Bias and Extrapolation in Markovian Linear Stochastic Approximation with Constant Stepsizes

no code implementations3 Oct 2022 Dongyan Huo, Yudong Chen, Qiaomin Xie

We consider Linear Stochastic Approximation (LSA) with a constant stepsize and Markovian data.

RL-QN: A Reinforcement Learning Framework for Optimal Control of Queueing Systems

no code implementations14 Nov 2020 Bai Liu, Qiaomin Xie, Eytan Modiano

In this work, we consider using model-based reinforcement learning (RL) to learn the optimal control policy for queueing networks so that the average job delay (or equivalently the average queue backlog) is minimized.

Model-based Reinforcement Learning reinforcement-learning +1

Provable Fictitious Play for General Mean-Field Games

no code implementations8 Oct 2020 Qiaomin Xie, Zhuoran Yang, Zhaoran Wang, Andreea Minca

We propose a reinforcement learning algorithm for stationary mean-field games, where the goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium.

reinforcement-learning Reinforcement Learning (RL)

Dynamic Regret of Policy Optimization in Non-stationary Environments

no code implementations NeurIPS 2020 Yingjie Fei, Zhuoran Yang, Zhaoran Wang, Qiaomin Xie

We consider reinforcement learning (RL) in episodic MDPs with adversarial full-information reward feedback and unknown fixed transition kernels.

Reinforcement Learning (RL)

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

no code implementations NeurIPS 2020 Yingjie Fei, Zhuoran Yang, Yudong Chen, Zhaoran Wang, Qiaomin Xie

We study risk-sensitive reinforcement learning in episodic Markov decision processes with unknown transition kernels, where the goal is to optimize the total reward under the risk measure of exponential utility.

Q-Learning reinforcement-learning +1

POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis

no code implementations NeurIPS 2020 Weichao Mao, Kaiqing Zhang, Qiaomin Xie, Tamer Başar

Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), has demonstrated remarkable performance in applications with finite spaces.

Stable Reinforcement Learning with Unbounded State Space

no code implementations L4DC 2020 Devavrat Shah, Qiaomin Xie, Zhi Xu

As a proof of concept, we propose an RL policy using Sparse-Sampling-based Monte Carlo Oracle and argue that it satisfies the stability property as long as the system dynamics under the optimal policy respects a Lyapunov function.

reinforcement-learning Reinforcement Learning (RL) +1

On Reinforcement Learning for Turn-based Zero-sum Markov Games

no code implementations25 Feb 2020 Devavrat Shah, Varun Somani, Qiaomin Xie, Zhi Xu

For a concrete instance of EIS where random policy is used for "exploration", Monte-Carlo Tree Search is used for "policy improvement" and Nearest Neighbors is used for "supervised learning", we establish that this method finds an $\varepsilon$-approximate value function of Nash equilibrium in $\widetilde{O}(\varepsilon^{-(d+4)})$ steps when the underlying state-space of the game is continuous and $d$-dimensional.

reinforcement-learning Reinforcement Learning (RL)

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

no code implementations17 Feb 2020 Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang

In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.

Understanding & Generalizing AlphaGo Zero

no code implementations ICLR 2019 Ravichandra Addanki, Mohammad Alizadeh, Shaileshh Bojja Venkatakrishnan, Devavrat Shah, Qiaomin Xie, Zhi Xu

AlphaGo Zero (AGZ) introduced a new {\em tabula rasa} reinforcement learning algorithm that has achieved superhuman performance in the games of Go, Chess, and Shogi with no prior knowledge other than the rules of the game.

Decision Making reinforcement-learning +2

Non-Asymptotic Analysis of Monte Carlo Tree Search

no code implementations14 Feb 2019 Devavrat Shah, Qiaomin Xie, Zhi Xu

In effect, we establish that to learn an $\varepsilon$ approximation of the value function with respect to $\ell_\infty$ norm, MCTS combined with nearest neighbor requires a sample size scaling as $\widetilde{O}\big(\varepsilon^{-(d+4)}\big)$, where $d$ is the dimension of the state space.

Q-learning with Nearest Neighbors

no code implementations NeurIPS 2018 Devavrat Shah, Qiaomin Xie

In particular, for MDPs with a $d$-dimensional state space and the discounted factor $\gamma \in (0, 1)$, given an arbitrary sample path with "covering time" $ L $, we establish that the algorithm is guaranteed to output an $\varepsilon$-accurate estimate of the optimal Q-function using $\tilde{O}\big(L/(\varepsilon^3(1-\gamma)^7)\big)$ samples.

Q-Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.