no code implementations • 1 Jul 2019 • Longxiang Shi, Shijian Li, Longbing Cao, Long Yang, Gang Zheng, Gang Pan
Alternatively, derivative-based methods treat the optimization process as a blackbox and show robustness and stability in learning continuous control tasks, but not data efficient in learning.
no code implementations • 17 May 2019 • Longxiang Shi, Shijian Li, Longbing Cao, Long Yang, Gang Pan
However, existing off-policy learning methods based on probabilistic policy measurement are inefficient when utilizing traces under a greedy target policy, which is ineffective for control problems.