Rethinking Deep Policy Gradients via State-Wise Policy Improvement

NeurIPS Workshop ICBINB 2020 · Kai-Chun Hu, Ping-Chun Hsieh, Ting Han Wei, I-Chen Wu ·

Deep policy gradient is one of the major frameworks in reinforcement learning, and it has been shown to improve parameterized policies across various tasks and environments. However, recent studies show that the key components of the deep policy gradient methods, such as gradient estimation, value prediction, and optimization landscapes, fail to reflect the conceptual framework. This paper aims to investigate the mechanism behind the deep policy gradient methods through the lens of state-wise policy improvement. Based on the fundamental properties of policy improvement, we propose an alternative theoretical framework to reinterpret the deep policy gradient update as training a binary classifier, with labels provided by the advantage function. This framework obviates the statistical difficulties in the gradient estimates and predicted values of the deep policy gradient update. Experimental results are included to corroborate the proposed framework.

PDF Abstract