A Distributional Perspective on Actor-Critic Framework

1 Jan 2021 · Daniel Wontae Nam, Younghoon Kim, Chan Youn Park ·

Recent distributional reinforcement learning methods, despite their successes, still contain fundamental problems that can lead to inaccurate representations of value distributions, such as distributional instability, action type restriction, and biased approximation. In this paper, we present a novel distributional actor-critic frame-work, GMAC, to address such problems. Adopting a stochastic policy removes the first two problems, and the bias in approximation is alleviated by minimizing the Cramer distance between the value distribution and its Bellman target distribution. In addition, GMAC improves data efficiency by generating the Bellman target distribution through Sample-Replacement algorithm, denoted by SR(λ), which provides a distributional generalization of multi-step policy evaluation algorithms.We empirically show that our method captures the multimodality of value distributions and improves the performance of conventional actor-critic methods with low computational cost in both discrete and continuous action spaces, using ArcadeLearning Environment (ALE) and PyBullet environment.

PDF Abstract