We recast exploration as a problem of State Marginal Matching (SMM), where we aim to learn a policy for which the state marginal distribution matches a given target state distribution, which can incorporate prior knowledge about the task.
In our approach, we perform online probabilistic filtering of latent task variables to infer how to solve a new task from small amounts of experience.
In this multi-agent setting, a set of parallel agents are executed in the same environment and each of these "rollout" agents are given the means to communicate with each other.
Text-based adventure games provide a platform on which to explore reinforcement learning in the context of a combinatorial action space, such as natural language.
In this paper we introduce a simple approach for exploration in reinforcement learning (RL) that allows us to develop theoretically justified algorithms in the tabular case but that is also extendable to settings where function approximation is required.
#6 best model for Atari Games on Atari 2600 Montezuma's Revenge