1 code implementation • 25 Feb 2022 • MohammadJavad Azizi, Branislav Kveton, Mohammad Ghavamzadeh, Sumeet Katariya
The Bayesian algorithm has access to a prior distribution over the meta-parameters and its meta simple regret over $m$ bandit tasks with horizon $n$ is mere $\tilde{O}(m / \sqrt{n})$.
1 code implementation • 25 Feb 2022 • MohammadJavad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György, Claire Vernade, Mohammad Ghavamzadeh
We study a sequential decision problem where the learner faces a sequence of $K$-armed bandit tasks.
no code implementations • 12 Jun 2021 • MohammadJavad Azizi, Sheldon M Ross, Zhengyu Zhang
We propose to use the classical "vector at a time" (VT) rule, which samples each remaining arm once in each round.