no code implementations • 11 Mar 2024 • Yufeng Zhang, Liyu Chen, Boyi Liu, Yingxiang Yang, Qiwen Cui, Yunzhe Tao, Hongxia Yang
Recent advances in reinforcement learning (RL) algorithms aim to enhance the performance of language models at scale.
no code implementations • 12 Feb 2024 • Qiwen Cui, Maryam Fazel, Simon S. Du
We study how to learn the optimal tax design to maximize the efficiency in nonatomic congestion games.
no code implementations • 11 Feb 2024 • Yan Dai, Qiwen Cui, Simon S. Du
Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL).
1 code implementation • 30 Oct 2023 • Zhaoyi Zhou, Chuning Zhu, Runlong Zhou, Qiwen Cui, Abhishek Gupta, Simon Shaolei Du
Off-policy dynamic programming (DP) techniques such as $Q$-learning have proven to be important in sequential decision-making problems.
no code implementations • 12 Jun 2023 • Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du
Specifically, we focus on games with bandit feedback, where testing an equilibrium can result in substantial regret even when the gap to be tested is small, and the existence of multiple optimal solutions (equilibria) in stationary games poses extra challenges.
no code implementations • 7 Feb 2023 • Qiwen Cui, Kaiqing Zhang, Simon S. Du
In contrast, existing works for Markov games with function approximation have sample complexity bounds scale with the size of the \emph{joint action space} when specialized to the canonical tabular Markov game setting, which is exponentially large in the number of agents.
no code implementations • 24 Oct 2022 • Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du
Starting from the facility-level (a. k. a., semi-bandit) feedback, we propose a novel one-unit deviation coverage condition and give a pessimism-type algorithm that can recover an approximate NE.
no code implementations • 4 Jun 2022 • Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du
We propose a centralized algorithm for Markov congestion games, whose sample complexity again has only polynomial dependence on all relevant problem parameters, but not the size of the action set.
no code implementations • 1 Jun 2022 • Qiwen Cui, Simon S. Du
Furthermore, for offline multi-agent general-sum Markov games, based on the strategy-wise bonus and a novel surrogate function, we give the first algorithm whose sample complexity only scales $\sum_{i=1}^mA_i$ where $A_i$ is the action size of the $i$-th player and $m$ is the number of players.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 1 Jun 2022 • Xinqi Wang, Qiwen Cui, Simon S. Du
This paper presents a systematic study on gap-dependent sample complexity in offline reinforcement learning.
no code implementations • 10 Jan 2022 • Qiwen Cui, Simon S. Du
We study what dataset assumption permits solving offline two-player zero-sum Markov games.
Multi-agent Reinforcement Learning reinforcement-learning +2
1 code implementation • 15 Jun 2021 • Haque Ishfaq, Qiwen Cui, Viet Nguyen, Alex Ayoub, Zhuoran Yang, Zhaoran Wang, Doina Precup, Lin F. Yang
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle.
1 code implementation • 14 Jun 2021 • MingHan Yang, Dong Xu, Qiwen Cui, Zaiwen Wen, Pengxiang Xu
In this paper, a novel second-order method called NG+ is proposed.
no code implementations • 19 Feb 2021 • Zhihan Xiong, Ruoqi Shen, Qiwen Cui, Maryam Fazel, Simon S. Du
To achieve the desired result, we develop 1) a new clipping operation to ensure both the probability of being optimistic and the probability of being pessimistic are lower bounded by a constant, and 2) a new recursive formula for the absolute value of estimation errors to analyze the regret.
no code implementations • 29 Nov 2020 • Qiwen Cui, Lin F. Yang
The empirical success of Multi-agent reinforcement learning is encouraging, while few theoretical guarantees have been revealed.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • NeurIPS 2020 • Qiwen Cui, Lin F. Yang
However, the understanding of the sample optimality of model-based RL is still largely missing, even for the linear case.