6 papers with code • 0 benchmarks • 0 datasets
Dota 2 is a multiplayer online battle arena (MOBA). The task is to train one-or-more agents to play and win the game.
( Image credit: OpenAI Five )
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.
This poses non-trivial difficulties for researchers or engineers and prevents the application of MARL to a broader range of real-world problems.
Even though death events are rare within a game (1\% of the data), the model achieves 0. 377 precision with 0. 725 recall on test data when prompted to predict which of any of the 10 players of either team will die within 5 seconds.