Search Results for author: Zhihan Xiong

Found 7 papers, 2 papers with code

A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity

1 code implementation27 Jul 2023 Zhihan Xiong, Romain Camilleri, Maryam Fazel, Lalit Jain, Kevin Jamieson

For robust identification, it is well-known that if arms are chosen randomly and non-adaptively from a G-optimal design over $\mathcal{X}$ at each time then the error probability decreases as $\exp(-T\Delta^2_{(1)}/d)$, where $\Delta_{(1)} = \min_{x \neq x^*} (x^* - x)^\top \frac{1}{T}\sum_{t=1}^T \theta_t$.

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

no code implementations12 Jun 2023 Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du

Specifically, we focus on games with bandit feedback, where testing an equilibrium can result in substantial regret even when the gap to be tested is small, and the existence of multiple optimal solutions (equilibria) in stationary games poses extra challenges.

Multi-agent Reinforcement Learning reinforcement-learning

Offline congestion games: How feedback type affects data coverage requirement

no code implementations24 Oct 2022 Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du

Starting from the facility-level (a. k. a., semi-bandit) feedback, we propose a novel one-unit deviation coverage condition and give a pessimism-type algorithm that can recover an approximate NE.

Vocal Bursts Type Prediction

Learning in Congestion Games with Bandit Feedback

no code implementations4 Jun 2022 Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du

We propose a centralized algorithm for Markov congestion games, whose sample complexity again has only polynomial dependence on all relevant problem parameters, but not the size of the action set.

Selective Sampling for Online Best-arm Identification

no code implementations NeurIPS 2021 Romain Camilleri, Zhihan Xiong, Maryam Fazel, Lalit Jain, Kevin Jamieson

The main results of this work precisely characterize this trade-off between labeled samples and stopping time and provide an algorithm that nearly-optimally achieves the minimal label complexity given a desired stopping time.

Binary Classification

Near-Optimal Randomized Exploration for Tabular Markov Decision Processes

no code implementations19 Feb 2021 Zhihan Xiong, Ruoqi Shen, Qiwen Cui, Maryam Fazel, Simon S. Du

To achieve the desired result, we develop 1) a new clipping operation to ensure both the probability of being optimistic and the probability of being pessimistic are lower bounded by a constant, and 2) a new recursive formula for the absolute value of estimation errors to analyze the regret.

Cannot find the paper you are looking for? You can Submit a new open access paper.