Search Results for author: Mehdi Jafarnia-Jahromi

Found 9 papers, 2 papers with code

A Bayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with an Arbitrary Opponent

no code implementations8 Sep 2021 Mehdi Jafarnia-Jahromi, Rahul Jain, Ashutosh Nayyar

In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of $O(HS\sqrt{AT})$ in the infinite-horizon zero-sum stochastic games with average-reward criterion.

Reinforcement Learning (RL)

Online Learning for Cooperative Multi-Player Multi-Armed Bandits

no code implementations7 Sep 2021 William Chang, Mehdi Jafarnia-Jahromi, Rahul Jain

For the first setting, we propose a UCB-inspired algorithm that achieves $O(\log T)$ regret whether the rewards are IID or Markovian.

Multi-Armed Bandits

Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path

no code implementations NeurIPS 2021 Liyu Chen, Mehdi Jafarnia-Jahromi, Rahul Jain, Haipeng Luo

We introduce a generic template for developing regret minimization algorithms in the Stochastic Shortest Path (SSP) model, which achieves minimax optimal regret as long as certain properties are ensured.

Online Learning for Stochastic Shortest Path Model via Posterior Sampling

no code implementations9 Jun 2021 Mehdi Jafarnia-Jahromi, Liyu Chen, Rahul Jain, Haipeng Luo

We consider the problem of online reinforcement learning for the Stochastic Shortest Path (SSP) problem modeled as an unknown MDP with an absorbing state.

reinforcement-learning Reinforcement Learning (RL)

Online Learning for Unknown Partially Observable MDPs

no code implementations25 Feb 2021 Mehdi Jafarnia-Jahromi, Rahul Jain, Ashutosh Nayyar

Learning optimal controllers for POMDPs when the model is unknown is harder.

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

no code implementations23 Jul 2020 Chen-Yu Wei, Mehdi Jafarnia-Jahromi, Haipeng Luo, Rahul Jain

We develop several new algorithms for learning Markov Decision Processes in an infinite-horizon average-reward setting with linear function approximation.

A Model-free Learning Algorithm for Infinite-horizon Average-reward MDPs with Near-optimal Regret

no code implementations8 Jun 2020 Mehdi Jafarnia-Jahromi, Chen-Yu Wei, Rahul Jain, Haipeng Luo

Recently, model-free reinforcement learning has attracted research attention due to its simplicity, memory and computation efficiency, and the flexibility to combine with function approximation.

Q-Learning reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.