Search Results for author: Kishan Panaganti

Found 8 papers, 6 papers with code

Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage

1 code implementation27 Oct 2023 Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh

The goal of an offline reinforcement learning (RL) algorithm is to learn optimal polices using historical (offline) data, without access to the environment for online exploration.

Offline RL Reinforcement Learning (RL)

Improved Sample Complexity Bounds for Distributionally Robust Reinforcement Learning

1 code implementation5 Mar 2023 Zaiyan Xu, Kishan Panaganti, Dileep Kalathil

We formulate this as a distributionally robust reinforcement learning (DR-RL) problem where the objective is to learn the policy which maximizes the value function against the worst possible stochastic model of the environment in an uncertainty set.

reinforcement-learning Reinforcement Learning (RL)

Personalized Reward Learning with Interaction-Grounded Learning (IGL)

1 code implementation28 Nov 2022 Jessica Maghakian, Paul Mineiro, Kishan Panaganti, Mark Rucker, Akanksha Saran, Cheng Tan

In an era of countless content offerings, recommender systems alleviate information overload by providing users with personalized content suggestions.

Recommendation Systems

Robust Reinforcement Learning using Offline Data

1 code implementation10 Aug 2022 Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh

The goal of robust reinforcement learning (RL) is to learn a policy that is robust against the uncertainty in model parameters.

reinforcement-learning Reinforcement Learning (RL)

Off-Policy Evaluation Using Information Borrowing and Context-Based Switching

1 code implementation18 Dec 2021 Sutanoy Dasgupta, Yabo Niu, Kishan Panaganti, Dileep Kalathil, Debdeep Pati, Bani Mallick

We consider the off-policy evaluation (OPE) problem in contextual bandits, where the goal is to estimate the value of a target policy using the data collected by a logging policy.

Multi-Armed Bandits Off-policy evaluation

Sample Complexity of Robust Reinforcement Learning with a Generative Model

1 code implementation2 Dec 2021 Kishan Panaganti, Dileep Kalathil

For each of these uncertainty sets, we give a precise characterization of the sample complexity of our proposed algorithm.

Model-based Reinforcement Learning reinforcement-learning +1

Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees

no code implementations20 Jun 2020 Kishan Panaganti, Dileep Kalathil

We first propose the Robust Least Squares Policy Evaluation algorithm, which is a multi-step online model-free learning algorithm for policy evaluation.

OpenAI Gym reinforcement-learning +1

Bounded Regret for Finitely Parameterized Multi-Armed Bandits

no code implementations3 Mar 2020 Kishan Panaganti, Dileep Kalathil

We propose an algorithm that is simple and easy to implement, which we call Finitely Parameterized Upper Confidence Bound (FP-UCB) algorithm, which uses the information about the underlying parameter set for faster learning.

Multi-Armed Bandits

Cannot find the paper you are looking for? You can Submit a new open access paper.