no code implementations • 2 Oct 2023 • Emmeran Johnson, Ciara Pike-Burke, Patrick Rebeschini
An algorithm is sample-efficient if it uses a number of queries $n$ to the environment that is polynomial in the dimension $d$ of the problem.
no code implementations • 3 Jul 2023 • Dirk van der Hoeven, Ciara Pike-Burke, Hao Qiu, Nicolo Cesa-Bianchi
Here, before making their prediction, each expert must be paid.
no code implementations • 1 Feb 2023 • Sattar Vakili, Danyal Ahmed, Alberto Bernacchia, Ciara Pike-Burke
An abstraction of the problem can be formulated as a kernel based bandit problem (also known as Bayesian optimisation), where a learner aims at optimising a kernelized function through sequential noisy observations.
no code implementations • 21 Jul 2022 • Benjamin Howson, Ciara Pike-Burke, Sarah Filippi
However, the stringent requirement for immediate rewards is unmet in many real-world applications where the reward is almost always delayed.
no code implementations • 25 Nov 2021 • Gábor Lugosi, Ciara Pike-Burke, Pierre-André Savalle
The fidelity bandits problem is a variant of the $K$-armed bandit problem in which the reward of each arm is augmented by a fidelity reward that provides the player with an additional payoff depending on how 'loyal' the player has been to that arm in the past.
no code implementations • 15 Nov 2021 • Benjamin Howson, Ciara Pike-Burke, Sarah Filippi
In this paper, we study the impact of delayed feedback in episodic reinforcement learning from a theoretical perspective and propose two general-purpose approaches to handling the delays.
no code implementations • NeurIPS 2021 • Evrard Garcelon, Vianney Perchet, Ciara Pike-Burke, Matteo Pirotta
Motivated by this, we study privacy in the context of finite-horizon Markov Decision Processes (MDPs) by requiring information to be obfuscated on the user side.
no code implementations • NeurIPS 2020 • Gergely Neu, Ciara Pike-Burke
The principle of optimism in the face of uncertainty underpins many theoretically successful reinforcement learning algorithms.
1 code implementation • NeurIPS 2019 • Ciara Pike-Burke, Steffen Grünewälder
We study the recovering bandits problem, a variant of the stochastic multi-armed bandit problem where the expected reward of each arm varies according to some unknown function of the time since the arm was last played.
no code implementations • ICML 2018 • Ciara Pike-Burke, Shipra Agrawal, Csaba Szepesvari, Steffen Grunewalder
In this problem, when the player pulls an arm, a reward is generated, however it is not immediately observed.