A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward

18 Jul 2016S. A. MurphyY. DengE. B. LaberH. R. MaeiR. S. SuttonK. Witkiewitz

We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals. This algorithm is developed with a view towards its use in mobile health...

PDF Abstract


No code implementations yet. Submit your code now


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper

🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet