no code implementations • NeurIPS 2013 • Paul Wagner
Approximate dynamic programming approaches to the reinforcement learning problem are often categorized into greedy value function methods and value-based policy gradient methods.
no code implementations • NeurIPS 2011 • Paul Wagner
We take a fresh view to this phenomenon by casting a considerable subset of the former approach as a limiting special case of the latter.