no code implementations • NeurIPS 2010 • Atsushi Miyamae, Yuichi Nagata, Isao Ono, Shigenobu Kobayashi
In this paper, we propose an efficient algorithm for estimating the natural policy gradient with parameter-based exploration; this algorithm samples directly in the parameter space.