Cooperative Multi-Agent Reinforcement Learning with Sequential Credit Assignment

Centralized training with decentralized execution is a standard paradigm for cooperative multi-agent reinforcement learning (MARL), with credit assignment being a major challenge. In this paper, we propose a cooperative MARL method with sequential credit assignment (SeCA) that deduces each agent's contribution to the team's success one by one to learn better cooperation. We first present a sequential MARL framework, under which we introduce a new counterfactual advantage to evaluate each agent based on its preceding agents' actions in a specific sequence. As this credit assignment sequence tremendously impacts the performance, we further present a sequence adjustment algorithm utilizing integrated gradients. It dynamically modifies the sequence among agents according to their contribution to the team. SeCA employs a network which either estimates the Q value for training the centralized critic or deduces the proposed advantage of each agent for decentralized policy learning. Our method is evaluated on a challenging set of StarCraft II micromanagement tasks and achieves state-of-the-art performance.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here