no code implementations • 16 Jul 2015 • Alexandra Carpentier, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, Peter Auer, András Antos
If the variance of the distributions were known, one could design an optimal sampling strategy by collecting a number of independent samples per distribution that is proportional to their variance.
no code implementations • NeurIPS 2007 • András Antos, Csaba Szepesvári, Rémi Munos
We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by another policy.