no code implementations • 27 Sep 2018 • Wei-Yang Qu, Yang Yu, Tang-Jie Lv, Ying-Feng Chen, Chang-Jie Fan
There are two policies in this approach, the exploration policy is used for exploratory sampling in the environment, then the benchmark policy try to update by the data proven by the exploration policy.