no code implementations • 8 Sep 2021 • Mehdi Jafarnia-Jahromi, Rahul Jain, Ashutosh Nayyar
In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of $O(HS\sqrt{AT})$ in the infinite-horizon zero-sum stochastic games with average-reward criterion.
no code implementations • 7 Sep 2021 • William Chang, Mehdi Jafarnia-Jahromi, Rahul Jain
For the first setting, we propose a UCB-inspired algorithm that achieves $O(\log T)$ regret whether the rewards are IID or Markovian.
no code implementations • NeurIPS 2021 • Liyu Chen, Mehdi Jafarnia-Jahromi, Rahul Jain, Haipeng Luo
We introduce a generic template for developing regret minimization algorithms in the Stochastic Shortest Path (SSP) model, which achieves minimax optimal regret as long as certain properties are ensured.
no code implementations • 9 Jun 2021 • Mehdi Jafarnia-Jahromi, Liyu Chen, Rahul Jain, Haipeng Luo
We consider the problem of online reinforcement learning for the Stochastic Shortest Path (SSP) problem modeled as an unknown MDP with an absorbing state.
no code implementations • 25 Feb 2021 • Mehdi Jafarnia-Jahromi, Rahul Jain, Ashutosh Nayyar
Learning optimal controllers for POMDPs when the model is unknown is harder.
no code implementations • 23 Jul 2020 • Chen-Yu Wei, Mehdi Jafarnia-Jahromi, Haipeng Luo, Rahul Jain
We develop several new algorithms for learning Markov Decision Processes in an infinite-horizon average-reward setting with linear function approximation.
no code implementations • 8 Jun 2020 • Mehdi Jafarnia-Jahromi, Chen-Yu Wei, Rahul Jain, Haipeng Luo
Recently, model-free reinforcement learning has attracted research attention due to its simplicity, memory and computation efficiency, and the flexibility to combine with function approximation.
1 code implementation • ICML 2020 • Chen-Yu Wei, Mehdi Jafarnia-Jahromi, Haipeng Luo, Hiteshi Sharma, Rahul Jain
Model-free reinforcement learning is known to be memory and computation efficient and more amendable to large scale problems.
1 code implementation • 25 Dec 2018 • Mehdi Jafarnia-Jahromi, Tasmin Chowdhury, Hsin-Tai Wu, Sayandev Mukherjee
In this paper, Permutation Phase Defense (PPD), is proposed as a novel method to resist adversarial attacks.