How to gamble with non-stationary $\mathcal{X}$-armed bandits and have no regrets

20 Aug 2019  ·  Valeriy Avanesov ·

In $\mathcal{X}$-armed bandit problem an agent sequentially interacts with environment which yields a reward based on the vector input the agent provides. The agent's goal is to maximise the sum of these rewards across some number of time steps. The problem and its variations have been a subject of numerous studies, suggesting sub-linear and some times optimal strategies. The given paper introduces a novel variation of the problem. We consider an environment, which can abruptly change its behaviour an unknown number of times. To that end we propose a novel strategy and prove it attains sub-linear cumulative regret. Moreover, in case of highly smooth relation between an action and the corresponding reward, the method is nearly optimal. The theoretical result are supported by experimental study.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here