no code implementations • 11 Mar 2024 • Navdeep Kumar, Yashaswini Murthy, Itai Shufaro, Kfir Y. Levy, R. Srikant, Shie Mannor
We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs).
no code implementations • 3 Sep 2023 • Uri Gadot, Esther Derman, Navdeep Kumar, Maxence Mohamed Elfatihi, Kfir Levy, Shie Mannor
In robust Markov decision processes (RMDPs), it is assumed that the reward and the transition dynamics lie in a given uncertainty set.
no code implementations • 9 Jun 2023 • Kaixin Wang, Uri Gadot, Navdeep Kumar, Kfir Levy, Shie Mannor
Robust Markov Decision Processes (RMDPs) provide a framework for sequential decision-making that is robust to perturbations on the transition kernel.
no code implementations • NeurIPS 2023 • Navdeep Kumar, Esther Derman, Matthieu Geist, Kfir Levy, Shie Mannor
We provide a closed-form expression for the worst occupation measure.
no code implementations • 31 Jan 2023 • Navdeep Kumar, Kfir Levy, Kaixin Wang, Shie Mannor
We present an efficient robust value iteration for \texttt{s}-rectangular robust Markov Decision Processes (MDPs) with a time complexity comparable to standard (non-robust) MDPs which is significantly faster than any existing method.
no code implementations • 3 Oct 2022 • Navdeep Kumar, Kaixin Wang, Kfir Levy, Shie Mannor
The policy gradient theorem proves to be a cornerstone in Linear RL due to its elegance and ease of implementability.
1 code implementation • 28 May 2022 • Navdeep Kumar, Kfir Levy, Kaixin Wang, Shie Mannor
But we don't have a clear understanding to exploit this equivalence, to do policy improvement steps to get the optimal value function or policy.
no code implementations • 30 Jan 2022 • Kaixin Wang, Navdeep Kumar, Kuangqi Zhou, Bryan Hooi, Jiashi Feng, Shie Mannor
The key of this perspective is to decompose the value space, in a state-wise manner, into unions of hypersurfaces.