Exploration by Uncertainty in Reward Space

27 Sep 2018 · Wei-Yang Qu, Yang Yu, Tang-Jie Lv, Ying-Feng Chen, Chang-Jie Fan ·

Efficient exploration plays a key role in reinforcement learning tasks. Commonly used dithering strategies, such as-greedy, try to explore the action-state space randomly; this can lead to large demand for samples. In this paper, We propose an exploration method based on the uncertainty in reward space. There are two policies in this approach, the exploration policy is used for exploratory sampling in the environment, then the benchmark policy try to update by the data proven by the exploration policy. Benchmark policy is used to provide the uncertainty in reward space, e.g. td-error, which guides the exploration policy updating. We apply our method on two grid-world environments and four Atari games. Experiment results show that our method improves learning speed and have a better performance than baseline policies

PDF Abstract