Search Results for author: Zhihan Xiong

Found 8 papers, 2 papers with code

Dual Approximation Policy Optimization

no code implementations2 Oct 2024 Zhihan Xiong, Maryam Fazel, Lin Xiao

We propose Dual Approximation Policy Optimization (DAPO), a framework that incorporates general function approximation into policy mirror descent methods.

A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity

1 code implementation27 Jul 2023 Zhihan Xiong, Romain Camilleri, Maryam Fazel, Lalit Jain, Kevin Jamieson

For robust identification, it is well-known that if arms are chosen randomly and non-adaptively from a G-optimal design over $\mathcal{X}$ at each time then the error probability decreases as $\exp(-T\Delta^2_{(1)}/d)$, where $\Delta_{(1)} = \min_{x \neq x^*} (x^* - x)^\top \frac{1}{T}\sum_{t=1}^T \theta_t$.

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

no code implementations12 Jun 2023 Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du

Specifically, we focus on games with bandit feedback, where testing an equilibrium can result in substantial regret even when the gap to be tested is small, and the existence of multiple optimal solutions (equilibria) in stationary games poses extra challenges.

Multi-agent Reinforcement Learning reinforcement-learning

Offline congestion games: How feedback type affects data coverage requirement

no code implementations24 Oct 2022 Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du

On the other hand, we convert the game to multi-agent linear bandits and show that with a generalized data coverage assumption in offline linear bandits, we can efficiently recover the approximate NE.

Vocal Bursts Type Prediction

Learning in Congestion Games with Bandit Feedback

no code implementations4 Jun 2022 Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du

We propose a centralized algorithm for Markov congestion games, whose sample complexity again has only polynomial dependence on all relevant problem parameters, but not the size of the action set.

Selective Sampling for Online Best-arm Identification

no code implementations NeurIPS 2021 Romain Camilleri, Zhihan Xiong, Maryam Fazel, Lalit Jain, Kevin Jamieson

The main results of this work precisely characterize this trade-off between labeled samples and stopping time and provide an algorithm that nearly-optimally achieves the minimal label complexity given a desired stopping time.

Binary Classification

Near-Optimal Randomized Exploration for Tabular Markov Decision Processes

no code implementations19 Feb 2021 Zhihan Xiong, Ruoqi Shen, Qiwen Cui, Maryam Fazel, Simon S. Du

To achieve the desired result, we develop 1) a new clipping operation to ensure both the probability of being optimistic and the probability of being pessimistic are lower bounded by a constant, and 2) a new recursive formula for the absolute value of estimation errors to analyze the regret.

Cannot find the paper you are looking for? You can Submit a new open access paper.