DDPG, or Deep Deterministic Policy Gradient, is an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. It combines the actor-critic approach with insights from DQNs: in particular, the insights that 1) the network is trained off-policy with samples from a replay buffer to minimize correlations between samples, and 2) the network is trained with a target Q network to give consistent targets during temporal difference backups. DDPG makes use of the same ideas along with batch normalization.
Source: Continuous control with deep reinforcement learningPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Reinforcement Learning (RL) | 123 | 46.24% |
Continuous Control | 30 | 11.28% |
OpenAI Gym | 14 | 5.26% |
Management | 11 | 4.14% |
Decision Making | 9 | 3.38% |
energy management | 6 | 2.26% |
Autonomous Driving | 6 | 2.26% |
Multi-agent Reinforcement Learning | 5 | 1.88% |
Imitation Learning | 4 | 1.50% |