Search Results for author: Denis Denisov

Regret Analysis of a Markov Policy Gradient Algorithm for Multi-arm Bandits

We consider a policy gradient algorithm applied to a finite-arm bandit problem with Bernoulli rewards.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.