no code implementations • ICLR Workshop drlStructPred 2019 • Bowen Tan*, Zhiting Hu*, Zichao Yang, Ruslan Salakhutdinov, Eric P. Xing
We present a generalized entropy regularized policy optimization formulation, and show that the apparently divergent algorithms can all be reformulated as special instances of the framework, with the only difference being the configurations of reward function and a couple of hyperparameters.