Interpreting Distributional Reinforcement Learning: Regularization and Optimization Perspectives

7 Oct 2021  ·  Ke Sun, Yingnan Zhao, Yi Liu, Enze Shi, Yafei Wang, Aref Sadeghi, Xiaodong Yan, Bei Jiang, Linglong Kong ·

Distributional reinforcement learning~(RL) is a class of state-of-the-art algorithms that estimate the whole distribution of the total return rather than only its expectation. Despite the remarkable performance of distributional RL, a theoretical understanding of its advantages over expectation-based RL remains elusive. In this paper, we illuminate the superiority of distributional RL from both regularization and optimization perspectives. Firstly, by applying an expectation decomposition, the additional impact of distributional RL compared with expectation-based RL is interpreted as a \textit{risk-aware entropy regularization} in the \textit{neural Z-fitted iteration} framework. We also provide a rigorous comparison between the resulting entropy regularization and the vanilla one in maximum entropy RL. Through the lens of optimization, we shed light on the stability-promoting distributional loss with desirable smoothness properties in distributional RL. Moreover, the acceleration effect of distributional RL owing to the risk-aware entropy regularization is also provided. Finally, rigorous experiments reveal the different regularization effects as well as the mutual impact of vanilla entropy and risk-aware entropy regularization in distributional RL, focusing specifically on actor-critic algorithms. We also empirically verify that the distributional RL algorithm enjoys a more stable gradient behavior, contributing to its stable optimization and acceleration effect as opposed to classical RL. Our research paves a way towards better interpreting the superiority of distributional RL algorithms.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods