Disentangling Sources of Risk for Distributional Multi-Agent Reinforcement Learning

29 Sep 2021  ยท  Kyunghwan Son, Junsu Kim, Yung Yi, Jinwoo Shin ยท

In cooperative multi-agent reinforcement learning, state transitions, rewards, and actions can all induce randomness (or uncertainty) in the observed long-term returns. These randomnesses are reflected from two risk sources: (a) agent-wise risk (i.e., how cooperative our teammates act for a given agent) and (b) environment-wise risk (i.e., transition stochasticity). Although these two sources are both important factors for learning robust policies of agents, prior works do not separate them or deal with only a single risk source, which could lead to suboptimal equilibria. In this paper, we propose Disentangled RIsk-sensitive Multi-Agent reinforcement learning (DRIMA), a novel framework being capable of disentangling risk sources. Our main idea is to separate risk-level leverages (i.e., quantiles) in both centralized training and decentralized execution with a hierarchical quantile structure and quantile regression. Our experiments demonstrate that DRIMA significantly outperforms prior-arts across various scenarios in StarCraft Multi-agent Challenge. Notably, DRIMA shows robust performance regardless of reward shaping, exploration schedule, where prior methods learn only a suboptimal policy.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
SMAC+ Def_Armored_parallel DRIMA Median Win Rate 60.0 # 3
SMAC+ Def_Armored_sequential DRIMA Median Win Rate 100 # 1
SMAC+ Def_Infantry_parallel DRIMA Median Win Rate 100.0 # 1
SMAC+ Def_Infantry_sequential DRIMA Median Win Rate 100 # 1
SMAC+ Def_Outnumbered_parallel DRIMA Median Win Rate 70.0 # 1
SMAC+ Def_Outnumbered_sequential DRIMA Median Win Rate 100 # 1
SMAC+ Off_Complicated_parallel DRIMA Median Win Rate 100 # 1
SMAC+ Off_Complicated_sequential DRIMA Median Win Rate 96.9 # 1
SMAC+ Off_Distant_parallel DRIMA Median Win Rate 95.0 # 1
SMAC+ Off_Distant_sequential DRIMA Median Win Rate 100 # 1
SMAC+ Off_Hard_parallel DRIMA Median Win Rate 80.0 # 1
SMAC+ Off_Hard_sequential DRIMA Median Win Rate 93.8 # 2
SMAC+ Off_Near_parallel DRIMA Median Win Rate 95.0 # 1
SMAC+ Off_Near_sequential DRIMA Median Win Rate 93.8 # 1
SMAC+ Off_Superhard_parallel DRIMA Median Win Rate 0.0 # 1
SMAC+ Off_Superhard_sequential DRIMA Median Win Rate 15.6 # 1

Methods


No methods listed for this paper. Add relevant methods here