no code implementations • 9 Jun 2023 • Erfan Seyedsalehi, Nima Akbarzadeh, Amit Sinha, Aditya Mahajan
In spite of the large literature on reinforcement learning (RL) algorithms for partially observable Markov decision processes (POMDPs), a complete theoretical understanding is still lacking.
no code implementations • 7 Feb 2022 • Nima Akbarzadeh, Aditya Mahajan
In particular, we consider a restless bandit model, and propose a Thompson-sampling based learning algorithm which is tuned to the underlying structure of the model.
no code implementations • 12 Apr 2021 • Nima Akbarzadeh, Aditya Mahajan
We consider the restless bandits with general state space under partial observability with two observational models: first, the state of each bandit is not observable at all, and second, the state of each bandit is observable only if it is chosen.
no code implementations • 13 Aug 2020 • Nima Akbarzadeh, Aditya Mahajan
We then revisit a previously proposed algorithm called adaptive greedy algorithm which is known to compute the Whittle index for a subclass of restless bandits.
no code implementations • 21 May 2016 • Nima Akbarzadeh, Cem Tekin
In the GRBP, the learner proceeds in a sequence of rounds, where each round is a Markov Decision Process (MDP) with two actions (arms): a continuation action that moves the learner randomly over the state space around the current state; and a terminal action that moves the learner directly into one of the two terminal states (goal and dead-end state).