Attention mechanisms have become a standard tool for sequence modeling tasks, in particular by stacking self-attention layers over the entire input sequence as in the Transformer architecture.
We investigate the training of sparse layers that use different parameters for different inputs based on hashing in large Transformer models.
We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve strong performance on reinforcement learning tasks specifically designed to challenge this functionality.
Ranked #3 on Language Modelling on enwik8
In this work, we present a memory-augmented approach for image-goal navigation.
We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve state of the art results on long-context language modeling, reinforcement learning, and algorithmic tasks.
Learning to navigate in a realistic setting where an agent must rely solely on visual inputs is a challenging task, in part because the lack of position information makes it difficult to provide supervision during training.
Transformers have been successfully applied to sequential, auto-regressive tasks despite being feedforward networks.
Ranked #4 on Language Modelling on enwik8
More precisely, we augment the self-attention layers with persistent memory vectors that play a similar role as the feed-forward layer.
Ranked #4 on Language Modelling on Text8
In this paper, we study the problem of hybrid language modeling, that is using models which can predict both characters and larger units such as character ngrams or words.
In hierarchical reinforcement learning a major challenge is determining appropriate low-level policies.
A desirable property of an intelligent agent is its ability to understand its environment to quickly generalize to novel tasks and compose simpler tasks into more complex ones.
When Bob is deployed on an RL task within the environment, this unsupervised training reduces the number of supervised episodes needed to learn, and in some cases converges to a higher reward.
We describe a very simple bag-of-words baseline for visual question answering.
This paper introduces MazeBase: an environment for simple 2D games, designed as a sandbox for machine learning approaches to reasoning and planning.
For the former our approach is competitive with Memory Networks, but with less supervision.
Ranked #6 on Question Answering on bAbi
The availability of large labeled datasets has allowed Convolutional Network models to achieve impressive recognition results.