We then train agents to minimize the difference between the attention weights that they apply to the environment at each timestep, and the attention of other agents.
We examine whether self-supervised language modeling applied to mathematical formulas enables logical reasoning.
We propose hierarchical predictive planning (HPP), a model-based reinforcement learning method for decentralized multiagent rendezvous.
We design and conduct a simple experiment to study whether neural networks can perform several steps of approximate reasoning in a fixed dimensional latent space.
We present a novel modular architecture for StarCraft II AI.
Imitation learning is a powerful paradigm for robot skill acquisition.