Compressive Transformers for Long-Range Sequence Modelling

ICLR 2020 Jack W. RaeAnna PotapenkoSiddhant M. JayakumarTimothy P. Lillicrap

We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Compressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17.1 ppl and 0.97 bpc respectively... (read more)

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.