no code implementations • NeurIPS 2020 • Vrettos Moulos
For rewards generated from a one-parameter exponential family of Markov chains, we provide a finite-time upper bound for the regret incurred from this adaptive allocation rule, which reveals the logarithmic dependence of the regret on the time horizon, and which is asymptotically optimal.
no code implementations • 5 Jan 2020 • Vrettos Moulos
This paper develops a Hoeffding inequality for the partial sums $\sum_{k=1}^n f (X_k)$, where $\{X_k\}_{k \in \mathbb{Z}_{> 0}}$ is an irreducible Markov chain on a finite state space $S$, and $f : S \to [a, b]$ is a real-valued function.
no code implementations • NeurIPS 2019 • Vrettos Moulos
We give a complete characterization of the sampling complexity of best Markovian arm identification in one-parameter Markovian bandit models.