MADDPG

Introduced by Lowe et al. in Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

MADDPG, or Multi-agent DDPG, extends DDPG into a multi-agent policy gradient algorithm where decentralized agents learn a centralized critic based on the observations and actions of all agents. It leads to learned policies that only use local information (i.e. their own observations) at execution time, does not assume a differentiable model of the environment dynamics or any particular structure on the communication method between agents, and is applicable not only to cooperative interaction but to competitive or mixed interaction involving both physical and communicative behavior. The critic is augmented with extra information about the policies of other agents, while the actor only has access to local information. After training is completed, only the local actors are used at execution phase, acting in a decentralized manner.

Source: Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Reinforcement Learning (RL)	19	31.67%
Multi-agent Reinforcement Learning	17	28.33%
Edge-computing	3	5.00%
Management	2	3.33%
Autonomous Driving	2	3.33%
Decision Making	2	3.33%
Face Recognition	1	1.67%
Incremental Learning	1	1.67%
Hierarchical Reinforcement Learning	1	1.67%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Adam	Stochastic Optimization
Batch Normalization	Normalization
Convolution	Convolutions
Dense Connections	Feedforward Networks
Experience Replay	Replay Memory
ReLU	Activation Functions
Weight Decay	Regularization

Categories

Add Remove

Policy Gradient Methods