1 code implementation • 7 Jun 2022 • Riccardo Grazzi, Arya Akhavan, John Isak Texas Falk, Leonardo Cella, Massimiliano Pontil
This is a very strong notion of fairness, since the relative rank is not directly observed by the agent and depends on the underlying reward model and on the distribution of rewards.
no code implementations • 30 May 2022 • Leonardo Cella, Karim Lounici, Massimiliano Pontil
We aim to leverage this information in order to learn a new downstream bandit task, which shares the same representation.
no code implementations • 21 Feb 2022 • Leonardo Cella, Karim Lounici, Grégoire Pacreau, Massimiliano Pontil
We study the problem of transfer-learning in the setting of stochastic linear bandit tasks.
no code implementations • 7 Dec 2020 • Leonardo Cella, Claudio Gentile, Massimiliano Pontil
Unlike known model selection efforts in the recent bandit literature, our algorithm exploits the specific structure of the problem to learn the unknown parameters of the expected loss function so as to identify the best arm as quickly as possible.
no code implementations • ICML 2020 • Leonardo Cella, Alessandro Lazaric, Massimiliano Pontil
The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution.
no code implementations • 24 Jan 2020 • Leonardo Cella, Ryan Martin
The standard notion of validity, what we refer to here as Type-1 validity, focuses on coverage probability of prediction regions, while a notion of validity relevant to the other prediction-related tasks performed by predictive distributions is lacking.
no code implementations • 7 Oct 2019 • Leonardo Cella, Nicolò Cesa-Bianchi
Motivated by recommendation problems in music streaming platforms, we propose a nonstationary stochastic bandit model in which the expected reward of an arm depends on the number of rounds that have passed since the arm was last pulled.
no code implementations • 28 Sep 2018 • Ilja Kuzborskij, Leonardo Cella, Nicolò Cesa-Bianchi
More precisely, we show that a sketch of size $m$ allows a $\mathcal{O}(md)$ update time for both algorithms, as opposed to $\Omega(d^2)$ required by their non-sketched versions in general (where $d$ is the dimension of context vectors).