no code implementations • NeurIPS 2012 • Tsuyoshi Ueno, Kohei Hayashi, Takashi Washio, Yoshinobu Kawahara
Reinforcement learning (RL) methods based on direct policy search (DPS) have been actively discussed to achieve an efficient approach to complicated Markov decision processes (MDPs).