A novel DDPG method with prioritized experience replay

Recently, a state-of-the-art algorithm, called deep deterministic policy gradient (DDPG), has achieved good performance in many continuous control tasks in the MuJoCo simulator. To further improve the efficiency of the experience replay mechanism in DDPG and thus speeding up the training process, in this paper, a prioritized experience replay method is proposed for the DDPG algorithm, where prioritized sampling is adopted instead of uniform sampling. The proposed DDPG with prioritized experience replay is tested with an inverted pendulum task via OpenAI Gym. The experimental results show that DDPG with prioritized experience replay can reduce the training time and improve the stability of the training process, and is less sensitive to the changes of some hyperparameters such as the size of replay buffer, minibatch and the updating rate of the target network.

PDF

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods