EXPODE: EXploiting POlicy Discrepancy for Efficient Exploration in Multi-agent Reinforcement Learning

International Conference on Autonomous Agents and Multiagent Systems 2023 · Yucong Zhang, Chao Yu ·

Recently, Multi-Agent Reinforcement Learning (MARL) has been applied to a large number of scenarios and has shown promising performance. However, existing MARL algorithms still suffer from the severe exploration problem. In this paper, we propose EXploiting POlicy Discrepancy for efficient Exploration (EXPODE), a new multi-agent exploration framework that leverages discrepancy between two different policies to enable the agents to explore the environment more efficiently. In addition, to tackle the mutual influence issue caused by the concurrent exploration of the agents, we propose three different mechanisms to coordinate the agents’ exploration by taking the information of other agents’ states and policies into account when measuring the agent-wise policy discrepancies. Experimental results on three challenging tasks, i.e., Predator Prey, StarCraft II micromanagement tasks, and Google Research Football, demonstrate that EXPODE achieves the state-of-the-art performance.

PDF