Search Results for author: Zhihan Xiong

Found 7 papers, 2 papers with code

A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity

1 code implementation • 27 Jul 2023 • Zhihan Xiong, Romain Camilleri, Maryam Fazel, Lalit Jain, Kevin Jamieson

For robust identification, it is well-known that if arms are chosen randomly and non-adaptively from a G-optimal design over $\mathcal{X}$ at each time then the error probability decreases as $\exp(-T\Delta^2_{(1)}/d)$, where $\Delta_{(1)} = \min_{x \neq x^*} (x^* - x)^\top \frac{1}{T}\sum_{t=1}^T \theta_t$.

Paper
Code

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

no code implementations • 12 Jun 2023 • Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du

Specifically, we focus on games with bandit feedback, where testing an equilibrium can result in substantial regret even when the gap to be tested is small, and the existence of multiple optimal solutions (equilibria) in stationary games poses extra challenges.

Multi-agent Reinforcement Learning reinforcement-learning

Paper
Add Code

Offline congestion games: How feedback type affects data coverage requirement

no code implementations • 24 Oct 2022 • Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du

Starting from the facility-level (a. k. a., semi-bandit) feedback, we propose a novel one-unit deviation coverage condition and give a pessimism-type algorithm that can recover an approximate NE.

Vocal Bursts Type Prediction

Paper
Add Code

Learning in Congestion Games with Bandit Feedback

no code implementations • 4 Jun 2022 • Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du

We propose a centralized algorithm for Markov congestion games, whose sample complexity again has only polynomial dependence on all relevant problem parameters, but not the size of the action set.

Paper
Add Code

Selective Sampling for Online Best-arm Identification

no code implementations • NeurIPS 2021 • Romain Camilleri, Zhihan Xiong, Maryam Fazel, Lalit Jain, Kevin Jamieson

The main results of this work precisely characterize this trade-off between labeled samples and stopping time and provide an algorithm that nearly-optimally achieves the minimal label complexity given a desired stopping time.

Binary Classification

Paper
Add Code

Near-Optimal Randomized Exploration for Tabular Markov Decision Processes

no code implementations • 19 Feb 2021 • Zhihan Xiong, Ruoqi Shen, Qiwen Cui, Maryam Fazel, Simon S. Du

To achieve the desired result, we develop 1) a new clipping operation to ensure both the probability of being optimistic and the probability of being pessimistic are lower bounded by a constant, and 2) a new recursive formula for the absolute value of estimation errors to analyze the regret.

Paper
Add Code

Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

1 code implementation • 23 Dec 2019 • Tian Tan, Zhihan Xiong, Vikranth R. Dwaracherla

We use an indexed value function to represent uncertainty in our action-value estimates.

Efficient Exploration reinforcement-learning +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.