DDPG, or Deep Deterministic Policy Gradient, is an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. It combines the actor-critic approach with insights from DQNs: in particular, the insights that 1) the network is trained off-policy with samples from a replay buffer to minimize correlations between samples, and 2) the network is trained with a target Q network to give consistent targets during temporal difference backups. DDPG makes use of the same ideas along with batch normalization.
Source: Continuous control with deep reinforcement learningPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Reinforcement Learning (RL) | 125 | 44.17% |
Continuous Control | 33 | 11.66% |
OpenAI Gym | 14 | 4.95% |
Decision Making | 12 | 4.24% |
Management | 12 | 4.24% |
energy management | 6 | 2.12% |
Autonomous Driving | 6 | 2.12% |
Multi-agent Reinforcement Learning | 5 | 1.77% |
Meta-Learning | 4 | 1.41% |