Multi-Armed Bandits for Correlated Markovian Environments with Smoothed Reward Feedback

11 Mar 2018 Tanner Fiez Shreyas Sekar Lillian J. Ratliff

We study a multi-armed bandit problem in a dynamic environment where arm rewards evolve in a correlated fashion according to a Markov chain. Different than much of the work on related problems, in our formulation a learning algorithm does not have access to either a priori information or observations of the state of the Markov chain and only observes smoothed reward feedback following time intervals we refer to as epochs... (read more)

PDF Abstract

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper

🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet