EXPODE: EXploiting POlicy Discrepancy for Efficient Exploration in Multi-agent Reinforcement Learning

Recently, Multi-Agent Reinforcement Learning (MARL) has been applied to a large number of scenarios and has shown promising performance. However, existing MARL algorithms still suffer from the severe exploration problem. In this paper, we propose EXploiting POlicy Discrepancy for efficient Exploration (EXPODE), a new multi-agent exploration framework that leverages discrepancy between two different policies to enable the agents to explore the environment more efficiently. In addition, to tackle the mutual influence issue caused by the concurrent exploration of the agents, we propose three different mechanisms to coordinate the agents’ exploration by taking the information of other agents’ states and policies into account when measuring the agent-wise policy discrepancies. Experimental results on three challenging tasks, i.e., Predator Prey, StarCraft II micromanagement tasks, and Google Research Football, demonstrate that EXPODE achieves the state-of-the-art performance.

PDF

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here