no code implementations • 27 Sep 2018 • Yao Shi, Tian Xia, Guanjun Zhao, Xin Gao
This paper puts forward a broad-spectrum improvement for reinforcement learning algorithms, which combines the policies using original rewards and inverse (negative) rewards.