no code implementations • NeurIPS 2009 • Pierre-Arnaud Coquelin, Romain Deguest, Rémi Munos
We derive an IPA estimator for the gradient of the log-likelihood, which may be used in a gradient method for the purpose of likelihood maximization.
no code implementations • NeurIPS 2008 • Pierre-Arnaud Coquelin, Romain Deguest, Rémi Munos
Our setting is a Partially Observable Markov Decision Process with continuous state, observation and action spaces.