no code implementations • 2 Jun 2018 • Yiming Zhang, Quan Ho Vuong, Kenny Song, Xiao-Yue Gong, Keith W. Ross
We develop several novel unbiased estimators for the entropy bonus and its gradient.
Atari Games reinforcement-learning +1