Search Results for author: Paul Wagner

Found 2 papers, 0 papers with code

Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result

no code implementations NeurIPS 2013 Paul Wagner

Approximate dynamic programming approaches to the reinforcement learning problem are often categorized into greedy value function methods and value-based policy gradient methods.

Policy Gradient Methods

A reinterpretation of the policy oscillation phenomenon in approximate policy iteration

no code implementations NeurIPS 2011 Paul Wagner

We take a fresh view to this phenomenon by casting a considerable subset of the former approach as a limiting special case of the latter.

Policy Gradient Methods

Cannot find the paper you are looking for? You can Submit a new open access paper.