Cooperative Multi-Agent Reinforcement Learning with Sequential Credit Assignment

NeurIPS 2021 · Yifan Zang, Jinmin He, Kai Li, Lily Cao, Haobo Fu, Qiang Fu, Junliang Xing ·

Centralized training with decentralized execution is a standard paradigm for cooperative multi-agent reinforcement learning (MARL), with credit assignment being a major challenge. In this paper, we propose a cooperative MARL method with sequential credit assignment (SeCA) that deduces each agent's contribution to the team's success one by one to learn better cooperation. We first present a sequential MARL framework, under which we introduce a new counterfactual advantage to evaluate each agent based on its preceding agents' actions in a specific sequence. As this credit assignment sequence tremendously impacts the performance, we further present a sequence adjustment algorithm utilizing integrated gradients. It dynamically modifies the sequence among agents according to their contribution to the team. SeCA employs a network which either estimates the Q value for training the centralized critic or deduces the proposed advantage of each agent for decentralized policy learning. Our method is evaluated on a challenging set of StarCraft II micromanagement tasks and achieves state-of-the-art performance.

PDF Abstract