Real-Time Strategy (RTS) tasks involve training an agent to play video games with continuous gameplay and high-level macro-strategic goals such as map control, economic superiority and more.
( Image credit: Multi-platform Version of StarCraft: Brood War in a Docker Container )
Training agents using Reinforcement Learning in games with sparse rewards is a challenging problem, since large amounts of exploration are required to retrieve even the first reward.
Specifically, our experiments show that invalid action masking scales well when the space of invalid actions is large, while the common approach of giving negative rewards for invalid actions will fail.
However, few Constraint Programming formalisms can deal with both optimization and uncertainty at the same time, and none of them are convenient to model problems we tackle in this paper.
Real-Time Strategy (RTS) games have recently become a popular testbed for artificial intelligence research.
These rules are not scalable and efficient enough to cope with the enormous yet partially observed state space in the game.
StarCraft, one of the most popular real-time strategy games, is a compelling environment for artificial intelligence research for both micro-level unit control and macro-level strategic decision making.
We formulate the problem of defogging as state estimation and future state prediction from previous, partial observations in the context of real-time strategy games.
Both TStarBot1 and TStarBot2 are able to defeat the built-in AI agents from level 1 to level 10 in a full game (1v1 Zerg-vs-Zerg game on the AbyssalReef map), noting that level 8, level 9, and level 10 are cheating agents with unfair advantages such as full vision on the whole map and resource harvest boosting.
Reinforcement learning (RL) is an area of research that has blossomed tremendously in recent years and has shown remarkable potential for artificial intelligence based opponents in computer games.