no code implementations • 8 Feb 2022 • Mehrdad Moharrami, Yashaswini Murthy, Arghyadip Roy, R. Srikant
We study the risk-sensitive exponential cost MDP formulation and develop a trajectory-based gradient algorithm to find the stationary point of the cost associated with a set of parameterized policies.
no code implementations • 14 Sep 2020 • Arghyadip Roy, Sanjay Shakkottai, R. Srikant
rewards are a special case of Markov rewards and it is difficult to design an algorithm that works well independent of whether the underlying model is truly Markovian or i. i. d.
no code implementations • 21 Dec 2019 • Arghyadip Roy, Vivek Borkar, Abhay Karandikar, Prasanna Chaporkar
To overcome the curses of dimensionality and modeling of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems, Reinforcement Learning (RL) methods are adopted in practice.
no code implementations • 28 Nov 2018 • Arghyadip Roy, Vivek Borkar, Abhay Karandikar, Prasanna Chaporkar
In this paper, we propose a new RL algorithm which utilizes the known threshold structure of the optimal policy while learning by reducing the feasible policy space.