no code implementations • 4 May 2021 • Lixin Zou, Long Xia, Linfang Hou, Xiangyu Zhao, Dawei Yin
This work introduces a practical, data-efficient policy learning method, named Variance-Bonus Monte Carlo Tree Search~(VB-MCTS), which can copy with very little data and facilitate learning from scratch in only a few trials.
no code implementations • 1 Jun 2020 • Linfang Hou, Liang Pang, Xin Hong, Yanyan Lan, Zhi-Ming Ma, Dawei Yin
Robust Reinforcement Learning aims to find the optimal policy with some extent of robustness to environmental dynamics.