BigBird is a Transformer with a sparse attention mechanism that reduces the quadratic dependency of self-attention to linear in the number of tokens. BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model. In particular, BigBird consists of three main parts:
This leads to a high performing attention mechanism scaling to much longer sequence lengths (8x).
Source: Big Bird: Transformers for Longer SequencesPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Document Classification | 5 | 12.20% |
Question Answering | 4 | 9.76% |
Sentence | 3 | 7.32% |
Language Modelling | 2 | 4.88% |
Text Summarization | 2 | 4.88% |
Classification | 2 | 4.88% |
Natural Language Inference | 2 | 4.88% |
Text Classification | 2 | 4.88% |
Malware Detection | 1 | 2.44% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |