1 code implementation • 17 Aug 2022 • Yassir Jedra, Junghyun Lee, Alexandre Proutière, Se-Young Yun
We investigate the problems of model estimation and reward-free learning in episodic Block MDPs.
no code implementations • NeurIPS 2020 • Kaito Ariu, Narae Ryu, Se-Young Yun, Alexandre Proutière
Interestingly, our analysis reveals the relative weights of the different components of regret: the component due to the constraint of not presenting the same item twice to the same user, that due to learning the chances users like items, and finally that arising when learning the underlying structure.
1 code implementation • 22 Oct 2020 • Kaito Ariu, Kenshi Abe, Alexandre Proutière
In this paper, we revisit the regret minimization problem in sparse stochastic contextual linear bandits, where feature vectors may be of large dimension $d$, but where the reward function depends on a few, say $s_0\ll d$, of these features only.