## Compressed Memory

Introduced by Rae et al. in Compressive Transformers for Long-Range Sequence Modelling

Compressed Memory is a secondary FIFO memory component proposed as part of the Compressive Transformer model. The Compressive Transformer keeps a fine-grained memory of past activations, which are then compressed into coarser compressed memories.

For choices of compression functions $f_{c}$ the authors consider (1) max/mean pooling, where the kernel and stride is set to the compression rate $c$; (2) 1D convolution also with kernel & stride set to $c$; (3) dilated convolutions; (4) most-used where the memories are sorted by their average attention (usage) and the most-used are preserved.

