1 code implementation • 27 Oct 2023 • Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh
The goal of an offline reinforcement learning (RL) algorithm is to learn optimal polices using historical (offline) data, without access to the environment for online exploration.
1 code implementation • 5 Mar 2023 • Zaiyan Xu, Kishan Panaganti, Dileep Kalathil
We formulate this as a distributionally robust reinforcement learning (DR-RL) problem where the objective is to learn the policy which maximizes the value function against the worst possible stochastic model of the environment in an uncertainty set.
1 code implementation • 28 Nov 2022 • Jessica Maghakian, Paul Mineiro, Kishan Panaganti, Mark Rucker, Akanksha Saran, Cheng Tan
In an era of countless content offerings, recommender systems alleviate information overload by providing users with personalized content suggestions.
1 code implementation • 10 Aug 2022 • Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh
The goal of robust reinforcement learning (RL) is to learn a policy that is robust against the uncertainty in model parameters.
1 code implementation • 18 Dec 2021 • Sutanoy Dasgupta, Yabo Niu, Kishan Panaganti, Dileep Kalathil, Debdeep Pati, Bani Mallick
We consider the off-policy evaluation (OPE) problem in contextual bandits, where the goal is to estimate the value of a target policy using the data collected by a logging policy.
1 code implementation • 2 Dec 2021 • Kishan Panaganti, Dileep Kalathil
For each of these uncertainty sets, we give a precise characterization of the sample complexity of our proposed algorithm.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 20 Jun 2020 • Kishan Panaganti, Dileep Kalathil
We first propose the Robust Least Squares Policy Evaluation algorithm, which is a multi-step online model-free learning algorithm for policy evaluation.
no code implementations • 3 Mar 2020 • Kishan Panaganti, Dileep Kalathil
We propose an algorithm that is simple and easy to implement, which we call Finitely Parameterized Upper Confidence Bound (FP-UCB) algorithm, which uses the information about the underlying parameter set for faster learning.