no code implementations • 15 Sep 2020 • Yubo Huang, Xuechun Wang, Luobao Zou, Zhiwei Zhuang, Weidong Zhang
In reinforcement learning (RL), we always expect the agent to explore as many states as possible in the initial stage of training and exploit the explored information in the subsequent stage to discover the most returnable trajectory.