Towards Understanding Deep Policy Gradients: A Case Study on PPO

CUHK Course IERG5350 2020  ·  Buhua Liu, CHONG YIN ·

Deep reinforcement learning has shown impressive performance on many decision-making problems, where deep policy gradient algorithms prevail in continuous action space tasks. Although many algorithm-level improvements on policy gradient algorithms have been proposed, recent studies have found that code-level optimizations also play a critical role in the claimed enhancement. In this paper, we further investigate several code-level optimizations for the popular Proximal Policy Optimization (PPO) algorithm, aiming to provide insights into the importance of different components in the practical implementations.\footnote{Video presentation is available at \url{https://youtu.be/M0uTLoEUwGQ}}

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here