Search Results for author: Pratik Gajane

Found 19 papers, 3 papers with code

Adversarial Multi-dueling Bandits

no code implementations18 Jun 2024 Pratik Gajane

We prove that the expected cumulative $T$-round regret of MiDEX compared to a Borda-winner from a set of $K$ arms is upper bounded by $O((K \log K)^{1/3} T^{2/3})$.

Investigating Gender Fairness in Machine Learning-driven Personalized Care for Chronic Pain

no code implementations29 Feb 2024 Pratik Gajane, Sean Newman, Mykola Pechenizkiy, John D. Piette

In this article, we study gender fairness in personalized pain care recommendations using a real-world application of reinforcement learning (Piette et al., 2022a).

Decision Making Fairness +4

Provably Efficient Exploration in Constrained Reinforcement Learning:Posterior Sampling Is All You Need

no code implementations27 Sep 2023 Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein

We present a new algorithm based on posterior sampling for learning in constrained Markov decision processes (CMDP) in the infinite-horizon undiscounted setting.

Efficient Exploration

Multi-Armed Bandits with Generalized Temporally-Partitioned Rewards

no code implementations1 Mar 2023 Ronald C. van den Broek, Rik Litjens, Tobias Sagis, Luc Siecker, Nina Verbeeke, Pratik Gajane

In some real-world applications, feedback about a decision is delayed and may arrive via partial rewards that are observed with different delays.

Decision Making Multi-Armed Bandits

Local Differential Privacy for Sequential Decision Making in a Changing Environment

no code implementations2 Jan 2023 Pratik Gajane

We study the problem of preserving privacy while still providing high utility in sequential decision making scenarios in a changing environment.

Decision Making Multi-Armed Bandits

Generalizing distribution of partial rewards for multi-armed bandits with temporally-partitioned rewards

no code implementations13 Nov 2022 Ronald C. van den Broek, Rik Litjens, Tobias Sagis, Luc Siecker, Nina Verbeeke, Pratik Gajane

In this paper, we introduce a general formulation of how an arm's cumulative reward is distributed across several rounds, called Beta-spread property.

Multi-Armed Bandits

Survey on Fair Reinforcement Learning: Theory and Practice

no code implementations20 May 2022 Pratik Gajane, Akrati Saxena, Maryam Tavakol, George Fletcher, Mykola Pechenizkiy

In this article, we provide an extensive overview of fairness approaches that have been implemented via a reinforcement learning (RL) framework.

Decision Making Fairness +3

The Impact of Batch Learning in Stochastic Linear Bandits

1 code implementation14 Feb 2022 Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein

Our main theoretical results show that the impact of batch learning is a multiplicative factor of batch size relative to the regret of online behavior.

Autonomous exploration for navigating in non-stationary CMPs

no code implementations18 Oct 2019 Pratik Gajane, Ronald Ortner, Peter Auer, Csaba Szepesvari

We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change.

Navigate

A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

no code implementations25 May 2018 Pratik Gajane, Ronald Ortner, Peter Auer

We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time.

reinforcement-learning Reinforcement Learning (RL)

Counterfactual Learning for Machine Translation: Degeneracies and Solutions

no code implementations23 Nov 2017 Carolin Lawrence, Pratik Gajane, Stefan Riezler

Counterfactual learning is a natural scenario to improve web-based machine translation services by offline learning from feedback logged during user interactions.

counterfactual Machine Translation +1

On Formalizing Fairness in Prediction with Machine Learning

no code implementations9 Oct 2017 Pratik Gajane, Mykola Pechenizkiy

Machine learning algorithms for prediction are increasingly being used in critical decisions affecting human lives.

BIG-bench Machine Learning Fairness

Corrupt Bandits for Preserving Local Privacy

no code implementations16 Aug 2017 Pratik Gajane, Tanguy Urvoy, Emilie Kaufmann

In this framework, motivated by privacy preservation in online recommender systems, the goal is to maximize the sum of the (unobserved) rewards, based on the observation of transformation of these rewards through a stochastic corruption process with known parameters.

Recommendation Systems

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

no code implementations15 Jan 2016 Pratik Gajane, Tanguy Urvoy, Fabrice Clérot

We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms.

Information Retrieval Retrieval

Utility-based Dueling Bandits as a Partial Monitoring Game

no code implementations10 Jul 2015 Pratik Gajane, Tanguy Urvoy

Partial monitoring is a generic framework for sequential decision-making with incomplete feedback.

Decision Making

Cannot find the paper you are looking for? You can Submit a new open access paper.