Preprint 2019

Generating Long Sequences with Sparse Transformers

Preprint 2019 openai/sparse_attention

Transformers are powerful sequence models, but require time and memory that grows quadratically with the sequence length.

 SOTA for Image Generation on CIFAR-10 (NLL Test metric )

AUDIO GENERATION IMAGE GENERATION LANGUAGE MODELLING

Generating Long Sequences with Sparse Transformers

Preprint 2019 ptillet/torch-blocksparse

Transformers are powerful sequence models, but require time and memory that grows quadratically with the sequence length.