Disentangling Sources of Risk for Distributional Multi-Agent Reinforcement Learning
In cooperative multi-agent reinforcement learning, state transitions, rewards, and actions can all induce randomness (or uncertainty) in the observed long-term returns. These randomnesses are reflected from two risk sources: (a) agent-wise risk (i.e., how cooperative our teammates act for a given agent) and (b) environment-wise risk (i.e., transition stochasticity). Although these two sources are both important factors for learning robust policies of agents, prior works do not separate them or deal with only a single risk source, which could lead to suboptimal equilibria. In this paper, we propose Disentangled RIsk-sensitive Multi-Agent reinforcement learning (DRIMA), a novel framework being capable of disentangling risk sources. Our main idea is to separate risk-level leverages (i.e., quantiles) in both centralized training and decentralized execution with a hierarchical quantile structure and quantile regression. Our experiments demonstrate that DRIMA significantly outperforms prior-arts across various scenarios in StarCraft Multi-agent Challenge. Notably, DRIMA shows robust performance regardless of reward shaping, exploration schedule, where prior methods learn only a suboptimal policy.
PDF AbstractDatasets

















Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
SMAC+ | Def_Armored_parallel | DRIMA | Median Win Rate | 60.0 | # 3 | |
SMAC+ | Def_Armored_sequential | DRIMA | Median Win Rate | 100 | # 1 | |
SMAC+ | Def_Infantry_parallel | DRIMA | Median Win Rate | 100.0 | # 1 | |
SMAC+ | Def_Infantry_sequential | DRIMA | Median Win Rate | 100 | # 1 | |
SMAC+ | Def_Outnumbered_parallel | DRIMA | Median Win Rate | 70.0 | # 1 | |
SMAC+ | Def_Outnumbered_sequential | DRIMA | Median Win Rate | 100 | # 1 | |
SMAC+ | Off_Complicated_parallel | DRIMA | Median Win Rate | 100 | # 1 | |
SMAC+ | Off_Complicated_sequential | DRIMA | Median Win Rate | 96.9 | # 1 | |
SMAC+ | Off_Distant_parallel | DRIMA | Median Win Rate | 95.0 | # 1 | |
SMAC+ | Off_Distant_sequential | DRIMA | Median Win Rate | 100 | # 1 | |
SMAC+ | Off_Hard_parallel | DRIMA | Median Win Rate | 80.0 | # 1 | |
SMAC+ | Off_Hard_sequential | DRIMA | Median Win Rate | 93.8 | # 2 | |
SMAC+ | Off_Near_parallel | DRIMA | Median Win Rate | 95.0 | # 1 | |
SMAC+ | Off_Near_sequential | DRIMA | Median Win Rate | 93.8 | # 1 | |
SMAC+ | Off_Superhard_parallel | DRIMA | Median Win Rate | 0.0 | # 1 | |
SMAC+ | Off_Superhard_sequential | DRIMA | Median Win Rate | 15.6 | # 1 |