no code implementations • 16 Mar 2023 • Ayush Aniket, Arpan Chattopadhyay
We study learning in periodic Markov Decision Process (MDP), a special type of non-stationary MDP where both the state transition probabilities and reward functions vary periodically, under the average reward maximization setting.
no code implementations • 25 Jul 2022 • Ayush Aniket, Arpan Chattopadhyay
We study learning in periodic Markov Decision Process(MDP), a special type of non-stationary MDP where both the state transition probabilities and reward functions vary periodically, under the average reward maximization setting.