Towards Understanding Deep Policy Gradients: A Case Study on PPO

CUHK Course IERG5350 2020 · Buhua Liu, CHONG YIN ·

Deep reinforcement learning has shown impressive performance on many decision-making problems, where deep policy gradient algorithms prevail in continuous action space tasks. Although many algorithm-level improvements on policy gradient algorithms have been proposed, recent studies have found that code-level optimizations also play a critical role in the claimed enhancement. In this paper, we further investigate several code-level optimizations for the popular Proximal Policy Optimization (PPO) algorithm, aiming to provide insights into the importance of different components in the practical implementations.\footnote{Video presentation is available at \url{https://youtu.be/M0uTLoEUwGQ}}

PDF Abstract