DDPG, or Deep Deterministic Policy Gradient, is an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. It combines the actor-critic approach with insights from DQNs: in particular, the insights that 1) the network is trained off-policy with samples from a replay buffer to minimize correlations between samples, and 2) the network is trained with a target Q network to give consistent targets during temporal difference backups. DDPG makes use of the same ideas along with batch normalization.
Source: Continuous control with deep reinforcement learningPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Reinforcement Learning (RL) | 125 | 44.64% |
Continuous Control | 31 | 11.07% |
OpenAI Gym | 14 | 5.00% |
Decision Making | 12 | 4.29% |
Management | 12 | 4.29% |
energy management | 6 | 2.14% |
Autonomous Driving | 6 | 2.14% |
Multi-agent Reinforcement Learning | 5 | 1.79% |
Imitation Learning | 4 | 1.43% |