TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
SMAC+	Def_Armored_sequential	MADDPG	Median Win Rate	90.6	# 4
SMAC+	Def_Infantry_sequential	MADDPG	Median Win Rate	100	# 1
SMAC+	Def_Outnumbered_sequential	MADDPG	Median Win Rate	81.3	# 2
SMAC+	Off_Complicated_sequential	MADDPG	Median Win Rate	0.0	# 3
SMAC+	Off_Distant_sequential	MADDPG	Median Win Rate	0.0	# 3
SMAC+	Off_Hard_sequential	MADDPG	Median Win Rate	0.0	# 3
SMAC+	Off_Near_sequential	MADDPG	Median Win Rate	75.0	# 3
SMAC+	Off_Superhard_sequential	MADDPG	Median Win Rate	0.0	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-agent-actor-critic-for-mixed/smac-on-smac-def-infantry-sequential)](https://paperswithcode.com/sota/smac-on-smac-def-infantry-sequential?p=multi-agent-actor-critic-for-mixed)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-agent-actor-critic-for-mixed/smac-on-smac-def-outnumbered-sequential)](https://paperswithcode.com/sota/smac-on-smac-def-outnumbered-sequential?p=multi-agent-actor-critic-for-mixed)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-agent-actor-critic-for-mixed/smac-on-smac-off-superhard-sequential)](https://paperswithcode.com/sota/smac-on-smac-off-superhard-sequential?p=multi-agent-actor-critic-for-mixed)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-agent-actor-critic-for-mixed/smac-on-smac-off-complicated-sequential)](https://paperswithcode.com/sota/smac-on-smac-off-complicated-sequential?p=multi-agent-actor-critic-for-mixed)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-agent-actor-critic-for-mixed/smac-on-smac-off-distant-sequential)](https://paperswithcode.com/sota/smac-on-smac-off-distant-sequential?p=multi-agent-actor-critic-for-mixed)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-agent-actor-critic-for-mixed/smac-on-smac-off-hard-sequential)](https://paperswithcode.com/sota/smac-on-smac-off-hard-sequential?p=multi-agent-actor-critic-for-mixed)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-agent-actor-critic-for-mixed/smac-on-smac-off-near-sequential)](https://paperswithcode.com/sota/smac-on-smac-off-near-sequential?p=multi-agent-actor-critic-for-mixed)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-agent-actor-critic-for-mixed/smac-on-smac-def-armored-sequential)](https://paperswithcode.com/sota/smac-on-smac-def-armored-sequential?p=multi-agent-actor-critic-for-mixed)`

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

NeurIPS 2017 · Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch ·

We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment, while policy gradient suffers from a variance that increases as the number of agents grows. We then present an adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination. Additionally, we introduce a training regimen utilizing an ensemble of policies for each agent that leads to more robust multi-agent policies. We show the strength of our approach compared to existing methods in cooperative as well as competitive scenarios, where agent populations are able to discover various physical and informational coordination strategies.

PDF Abstract NeurIPS 2017 PDF NeurIPS 2017 Abstract

Code

Add Remove Mark official

openai/multiagent-particle-envs official

2,195

ray-project/ray

31,072

openai/maddpg

1,521

xuehy/pytorch-maddpg

574

shariqiqbal2810/maddpg-pytorch

524

See all 84 implementations

Tasks

Add Remove

Multi-agent Reinforcement Learning

Q-Learning

reinforcement-learning

Reinforcement Learning (RL)

SMAC+

Datasets

100DOH

SMAC-Exp

Def_Armored_sequential

Def_Infantry_sequential

Def_Outnumbered_sequential

Off_Hard_sequential

Off_Complicated_sequential

Off_Distant_sequential

Off_Near_sequential

Off_Superhard_sequential

Results from the Paper

Edit

Ranked #1 on SMAC+ on Def_Infantry_sequential

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
SMAC+	Def_Armored_sequential	MADDPG	Median Win Rate	90.6	# 4	Compare
SMAC+	Def_Infantry_sequential	MADDPG	Median Win Rate	100	# 1	Compare
SMAC+	Def_Outnumbered_sequential	MADDPG	Median Win Rate	81.3	# 2	Compare
SMAC+	Off_Complicated_sequential	MADDPG	Median Win Rate	0.0	# 3	Compare
SMAC+	Off_Distant_sequential	MADDPG	Median Win Rate	0.0	# 3	Compare
SMAC+	Off_Hard_sequential	MADDPG	Median Win Rate	0.0	# 3	Compare
SMAC+	Off_Near_sequential	MADDPG	Median Win Rate	75.0	# 3	Compare
SMAC+	Off_Superhard_sequential	MADDPG	Median Win Rate	0.0	# 2	Compare

Methods

Add Remove

Adam • Batch Normalization • Convolution • Dense Connections • Experience Replay • MADDPG • Q-Learning • ReLU • Weight Decay

Edit Social Preview

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove