no code implementations • 10 Jul 2015 • Pratik Gajane, Tanguy Urvoy
Partial monitoring is a generic framework for sequential decision-making with incomplete feedback.
no code implementations • 15 Jan 2016 • Pratik Gajane, Tanguy Urvoy, Fabrice Clérot
We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms.
no code implementations • 16 Aug 2017 • Pratik Gajane, Tanguy Urvoy, Emilie Kaufmann
In this framework, motivated by privacy preservation in online recommender systems, the goal is to maximize the sum of the (unobserved) rewards, based on the observation of transformation of these rewards through a stochastic corruption process with known parameters.
no code implementations • 9 Oct 2017 • Pratik Gajane, Mykola Pechenizkiy
Machine learning algorithms for prediction are increasingly being used in critical decisions affecting human lives.
no code implementations • 23 Nov 2017 • Carolin Lawrence, Pratik Gajane, Stefan Riezler
Counterfactual learning is a natural scenario to improve web-based machine translation services by offline learning from feedback logged during user interactions.
no code implementations • 25 May 2018 • Pratik Gajane, Ronald Ortner, Peter Auer
We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time.
no code implementations • 14 May 2019 • Pratik Gajane, Ronald Ortner, Peter Auer
This is the first variational regret bound for the general reinforcement learning setting.
no code implementations • 18 Oct 2019 • Pratik Gajane, Ronald Ortner, Peter Auer, Csaba Szepesvari
We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change.
1 code implementation • 3 Nov 2021 • Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein
We consider a special case of bandit problems, namely batched bandits.
1 code implementation • 14 Feb 2022 • Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein
Our main theoretical results show that the impact of batch learning is a multiplicative factor of batch size relative to the regret of online behavior.
no code implementations • 20 May 2022 • Pratik Gajane, Akrati Saxena, Maryam Tavakol, George Fletcher, Mykola Pechenizkiy
In this article, we provide an extensive overview of fairness approaches that have been implemented via a reinforcement learning (RL) framework.
1 code implementation • 8 Sep 2022 • Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein
We study a posterior sampling approach to efficient exploration in constrained reinforcement learning.
no code implementations • 13 Nov 2022 • Ronald C. van den Broek, Rik Litjens, Tobias Sagis, Luc Siecker, Nina Verbeeke, Pratik Gajane
In this paper, we introduce a general formulation of how an arm's cumulative reward is distributed across several rounds, called Beta-spread property.
no code implementations • 2 Jan 2023 • Pratik Gajane
We study the problem of preserving privacy while still providing high utility in sequential decision making scenarios in a changing environment.
no code implementations • 21 Feb 2023 • Jiong Li, Pratik Gajane
Sparsity of rewards while applying a deep reinforcement learning method negatively affects its sample-efficiency.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 1 Mar 2023 • Ronald C. van den Broek, Rik Litjens, Tobias Sagis, Luc Siecker, Nina Verbeeke, Pratik Gajane
In some real-world applications, feedback about a decision is delayed and may arrive via partial rewards that are observed with different delays.
no code implementations • 27 Sep 2023 • Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein
We present a new algorithm based on posterior sampling for learning in constrained Markov decision processes (CMDP) in the infinite-horizon undiscounted setting.
no code implementations • 29 Feb 2024 • Pratik Gajane, Sean Newman, Mykola Pechenizkiy, John D. Piette
In this article, we study gender fairness in personalized pain care recommendations using a real-world application of reinforcement learning (Piette et al., 2022a).