Counterfactual Multi-Agent Policy Gradients

Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
SMAC+ Def_Armored_parallel COMA Median Win Rate 0.0 # 6
SMAC+ Def_Armored_sequential COMA Median Win Rate 0.0 # 9
SMAC+ Def_Infantry_parallel COMA Median Win Rate 50.0 # 6
SMAC+ Def_Infantry_sequential COMA Median Win Rate 28.1 # 11
SMAC+ Def_Outnumbered_parallel COMA Median Win Rate 0.0 # 4
SMAC+ Def_Outnumbered_sequential COMA Median Win Rate 0.0 # 5
SMAC+ Off_Complicated_parallel COMA Median Win Rate 0.0 # 4
SMAC+ Off_Complicated_sequential COMA Median Win Rate 0.0 # 3
SMAC+ Off_Distant_parallel COMA Median Win Rate 0.0 # 3
SMAC+ Off_Distant_sequential COMA Median Win Rate 0.0 # 3
SMAC+ Off_Hard_parallel COMA Median Win Rate 0.0 # 3
SMAC+ Off_Hard_sequential COMA Median Win Rate 0.0 # 3
SMAC+ Off_Near_parallel COMA Median Win Rate 20.0 # 4
SMAC+ Off_Near_sequential COMA Median Win Rate 0.0 # 4
SMAC+ Off_Superhard_parallel COMA Median Win Rate 0.0 # 1
SMAC+ Off_Superhard_sequential COMA Median Win Rate 0.0 # 2

Methods


No methods listed for this paper. Add relevant methods here