Search Results for author: Kimang Khun

Found 1 papers, 0 papers with code

Reinforcement Learning for Markovian Bandits: Is Posterior Sampling more Scalable than Optimism?

no code implementations16 Jun 2021 Nicolas Gast, Bruno Gaujal, Kimang Khun

While the regret bound and runtime of vanilla implementations of PSRL and UCRL2 are exponential in the number of bandits, we show that the episodic regret of MB-PSRL and MB-UCRL2 is $\tilde{O}(S\sqrt{nK})$ where $K$ is the number of episodes, $n$ is the number of bandits and $S$ is the number of states of each bandit (the exact bound in S, n and K is given in the paper).

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.