1 code implementation • 5 Mar 2023 • Zaiyan Xu, Kishan Panaganti, Dileep Kalathil
We formulate this as a distributionally robust reinforcement learning (DR-RL) problem where the objective is to learn the policy which maximizes the value function against the worst possible stochastic model of the environment in an uncertainty set.
no code implementations • 23 Feb 2023 • Ting-Jui Chang, Sapana Chaudhary, Dileep Kalathil, Shahin Shahrampour
We prove that for convex functions, D-Safe-OGD achieves a dynamic regret bound of $O(T^{2/3} \sqrt{\log T} + T^{1/3}C_T^*)$, where $C_T^*$ denotes the path-length of the best minimizer sequence.
no code implementations • 13 Oct 2022 • Aayushman Sharma, Zirui Mao, Haiying Yang, Suman Chakravorty, Michael J Demkowicz, Dileep Kalathil
In this paper, we consider the optimal control of material micro-structures.
1 code implementation • 26 Sep 2022 • Desik Rengarajan, Sapana Chaudhary, Jaewon Kim, Dileep Kalathil, Srinivas Shakkottai
Meta reinforcement learning (Meta-RL) is an approach wherein the experience gained from solving a variety of tasks is distilled into a meta-policy.
no code implementations • 26 Aug 2022 • Vasudev Gohil, Satwik Patnaik, Hao Guo, Dileep Kalathil, Jeyavijayan, Rajendran
Insertion of hardware Trojans (HTs) in integrated circuits is a pernicious threat.
no code implementations • 18 Aug 2022 • Deepan Muthirayan, Dileep Kalathil, Pramod P. Khargonekar
We show that when the number of tasks are sufficiently large, our proposed approach achieves a meta-regret that is smaller by a factor $D/D^{*}$ compared to an independent-learning online control algorithm which does not perform learning across the tasks, where $D$ is a problem constant and $D^{*}$ is a scalar that decreases with increase in the similarity between tasks.
1 code implementation • 10 Aug 2022 • Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh
The goal of robust reinforcement learning (RL) is to learn a policy that is robust against the uncertainty in model parameters.
no code implementations • 15 Jul 2022 • Amit Jena, Tong Huang, S. Sivaranjani, Dileep Kalathil, Le Xie
One standard approach to estimate the stability region of a general nonlinear system is to first find a Lyapunov function for the system and characterize its region of attraction as the stability region.
1 code implementation • 10 Jun 2022 • Ruida Zhou, Tao Liu, Dileep Kalathil, P. R. Kumar, Chao Tian
We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions, which are to be jointly optimized according to given criteria such as proportional fairness (smooth concave scalarization), hard constraints (constrained MDP), and max-min trade-off.
no code implementations • 8 Mar 2022 • Rayan El Helou, S. Sivaranjani, Dileep Kalathil, Andrew Schaper, Le Xie
In fact, we find that as little as 11% of heavy duty vehicles in Texas charging simultaneously can lead to significant voltage violations on the transmission network that compromise grid reliability.
1 code implementation • ICLR 2022 • Desik Rengarajan, Gargi Vaidya, Akshay Sarvesh, Dileep Kalathil, Srinivas Shakkottai
We demonstrate the superior performance of our algorithm over state-of-the-art approaches on a number of benchmark environments with sparse rewards and censored state.
1 code implementation • 18 Dec 2021 • Sutanoy Dasgupta, Yabo Niu, Kishan Panaganti, Dileep Kalathil, Debdeep Pati, Bani Mallick
We consider the off-policy evaluation (OPE) problem in contextual bandits, where the goal is to estimate the value of a target policy using the data collected by a logging policy.
1 code implementation • 2 Dec 2021 • Kishan Panaganti, Dileep Kalathil
For each of these uncertainty sets, we give a precise characterization of the sample complexity of our proposed algorithm.
Model-based Reinforcement Learning
reinforcement-learning
+1
1 code implementation • 1 Dec 2021 • Archana Bura, Aria HasanzadeZonuzy, Dileep Kalathil, Srinivas Shakkottai, Jean-Francois Chamberland
Safe reinforcement learning is extremely challenging--not only must the agent explore an unknown environment, it must do so while ensuring no safety constraint violations.
no code implementations • 30 Nov 2021 • Deepan Muthirayan, Jianjun Yuan, Dileep Kalathil, Pramod P. Khargonekar
Specifically, we study the online learning problem where the control algorithm does not know the true system model and has only access to a fixed-length (that does not grow with the control horizon) preview of the future cost functions.
no code implementations • 14 Nov 2021 • Sapana Chaudhary, Dileep Kalathil
We study the problem of safe online convex optimization, where the action at each time step must satisfy a set of linear safety constraints.
no code implementations • 31 Oct 2021 • Tao Liu, Ruida Zhou, Dileep Kalathil, P. R. Kumar, Chao Tian
We propose a new algorithm called policy mirror descent-primal dual (PMD-PD) algorithm that can provably achieve a faster $\mathcal{O}(\log(T)/T)$ convergence rate for both the optimality gap and the constraint violation.
no code implementations • 5 Oct 2021 • Akhil Nagariya, Dileep Kalathil, Srikanth Saripalli
Compared to the standard ILQR approach, our proposed approach achieves a 30% and 50% reduction in cross track error in Warthog and Moose, respectively, by utilizing only 30 minutes of real-world driving data.
no code implementations • 13 Sep 2021 • Dongqi Wu, Dileep Kalathil, Miroslav Begovic, Le Xie
This paper introduces PyProD, a Python-based machine learning (ML)-compatible test-bed for evaluating the efficacy of protection schemes in electric distribution grids.
no code implementations • NeurIPS 2021 • Tao Liu, Ruida Zhou, Dileep Kalathil, P. R. Kumar, Chao Tian
We show that when a strictly safe policy is known, then one can confine the system to zero constraint violation with arbitrarily high probability while keeping the reward regret of order $\tilde{\mathcal{O}}(\sqrt{K})$.
no code implementations • 3 Aug 2020 • Rayan El Helou, Dileep Kalathil, Le Xie
In this paper, we introduce a new framework to address the problem of voltage regulation in unbalanced distribution grids with deep photovoltaic penetration.
no code implementations • 1 Aug 2020 • Aria HasanzadeZonuzy, Archana Bura, Dileep Kalathil, Srinivas Shakkottai
Many physical systems have underlying safety considerations that require that the policy employed ensures the satisfaction of a set of constraints.
no code implementations • 21 Jun 2020 • Kiyeob Lee, Desik Rengarajan, Dileep Kalathil, Srinivas Shakkottai
We introduce a natural refinement to the equilibrium concept that we call Trembling-Hand-Perfect MFE (T-MFE), which allows agents to employ a measure of randomization while accounting for the impact of such randomization on their payoffs.
no code implementations • 20 Jun 2020 • Kishan Panaganti, Dileep Kalathil
We first propose the Robust Least Squares Policy Evaluation algorithm, which is a multi-step online model-free learning algorithm for policy evaluation.
no code implementations • 5 Mar 2020 • Dongqi Wu, Dileep Kalathil, Miroslav Begovic, Le Xie
This paper introduces the concept of Deep Reinforcement Learning based architecture for protective relay design in power distribution systems with many distributed energy resources (DERs).
no code implementations • 3 Mar 2020 • Kishan Panaganti, Dileep Kalathil
We propose an algorithm that is simple and easy to implement, which we call Finitely Parameterized Upper Confidence Bound (FP-UCB) algorithm, which uses the information about the underlying parameter set for faster learning.
1 code implementation • 17 Apr 2019 • Ran Wang, Karthikeya Parunandi, Dan Yu, Dileep Kalathil, Suman Chakravorty
This paper proposes a novel decoupled data-based control (D2C) algorithm that addresses this problem using a decoupled, `open loop - closed loop', approach.
no code implementations • 4 Jan 2019 • Rajarshi Bhattacharyya, Archana Bura, Desik Rengarajan, Mason Rumuly, Bainan Xia, Srinivas Shakkottai, Dileep Kalathil, Ricky K. P. Mok, Amogh Dhamdhere
The predominant use of wireless access networks is for media streaming applications, which are only gaining popularity as ever more devices become available for this purpose.
no code implementations • 4 May 2015 • Naumaan Nayyar, Dileep Kalathil, Rahul Jain
The objective is to design a policy that maximizes the expected reward over a time horizon for a single player setting and the sum of expected rewards for the multiplayer setting.
no code implementations • 30 Nov 2014 • Dileep Kalathil, Vivek S. Borkar, Rahul Jain
We propose a new simple and natural algorithm for learning the optimal Q-value function of a discounted-cost Markov Decision Process (MDP) when the transition kernels are unknown.
no code implementations • 3 Nov 2014 • Dileep Kalathil, Vivek Borkar, Rahul Jain
Firstly, we give a simple and computationally tractable strategy for approachability for Stackelberg stochastic games along the lines of Blackwell's.