Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization

NeurIPS 2021  ·  Arnab Maiti, Vishakha Patil, Arindam Khan ·

We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the memory at any time, is bounded. The decision-maker can only pull arms that are present in the memory. We address the problem from the perspective of two standard objectives: 1) regret minimization, and 2) best-arm identification. For regret minimization, we settle an important open question by showing an almost tight guarantee. We show $\Omega(T^{2/3})$ cumulative regret in expectation for single-pass algorithms for arm-memory size of $(n-1)$, where $n$ is the number of arms. For best-arm identification, we provide an $(\varepsilon, \delta)$-PAC algorithm with arm memory size of $O(\log^*n)$ and $O(\frac{n}{\varepsilon^2}\cdot \log(\frac{1}{\delta}))$ optimal sample complexity.

PDF Abstract NeurIPS 2021 PDF NeurIPS 2021 Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here