no code implementations • 3 Jun 2011 • J. Baxter, P. L. Bartlett, L. Weaver
These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter and Bartlett, this volume), which computes biased estimates of the performance gradient in POMDPs.
no code implementations • 1 Jun 2011 • J. Baxter
The central assumption of the model is that the learner is embedded within an environment of related learning tasks.