Search Results for author: Pratik Gajane

Found 9 papers, 1 papers with code

Autonomous exploration for navigating in non-stationary CMPs

no code implementations18 Oct 2019 Pratik Gajane, Ronald Ortner, Peter Auer, Csaba Szepesvari

We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change.

Variational Regret Bounds for Reinforcement Learning

no code implementations14 May 2019 Pratik Gajane, Ronald Ortner, Peter Auer

This is the first variational regret bound for the general reinforcement learning setting.

General Reinforcement Learning

A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

no code implementations25 May 2018 Pratik Gajane, Ronald Ortner, Peter Auer

We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time.

Counterfactual Learning for Machine Translation: Degeneracies and Solutions

no code implementations23 Nov 2017 Carolin Lawrence, Pratik Gajane, Stefan Riezler

Counterfactual learning is a natural scenario to improve web-based machine translation services by offline learning from feedback logged during user interactions.

Machine Translation Translation

On Formalizing Fairness in Prediction with Machine Learning

no code implementations9 Oct 2017 Pratik Gajane, Mykola Pechenizkiy

Machine learning algorithms for prediction are increasingly being used in critical decisions affecting human lives.

Fairness

Corrupt Bandits for Preserving Local Privacy

no code implementations16 Aug 2017 Pratik Gajane, Tanguy Urvoy, Emilie Kaufmann

In this framework, motivated by privacy preservation in online recommender systems, the goal is to maximize the sum of the (unobserved) rewards, based on the observation of transformation of these rewards through a stochastic corruption process with known parameters.

Recommendation Systems

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

no code implementations15 Jan 2016 Pratik Gajane, Tanguy Urvoy, Fabrice Clérot

We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms.

Information Retrieval

Utility-based Dueling Bandits as a Partial Monitoring Game

no code implementations10 Jul 2015 Pratik Gajane, Tanguy Urvoy

Partial monitoring is a generic framework for sequential decision-making with incomplete feedback.

Decision Making

Cannot find the paper you are looking for? You can Submit a new open access paper.