Big Bird: Transformers for Longer Sequences

Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP. Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence length due to their full attention mechanism... (read more)

PDF Abstract NeurIPS 2020 PDF NeurIPS 2020 Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Text Classification arXiv BigBird Accuracy 92.31 # 1
Text Summarization arXiv BigBird-Pegasus ROUGE-1 46.63 # 1
ROUGE-2 19.02 # 1
ROUGE-L 41.77 # 1
Document Summarization BBC XSum BigBird-Pegasus ROUGE-1 47.12 # 1
ROUGE-2 24.05 # 1
ROUGE-L 38.8 # 1
Text Summarization BigPatent BigBird-Pegasus ROUGE-1 60.64 # 1
ROUGE-2 42.46 # 1
ROUGE-L 50.01 # 1
Document Summarization CNN / Daily Mail BigBird-Pegasus ROUGE-1 43.84 # 4
ROUGE-2 21.11 # 2
ROUGE-L 40.74 # 1
Linguistic Acceptability CoLA BigBird Accuracy 58.5% # 13
Chromatin-Profile Prediction DeepSea BigBird TF 96.1 # 1
HM 88.7 # 1
DHS 92.1 # 1
Question Answering HotpotQA BigBird-etc Joint F1 73.6 # 3
Ans 81.2 # 2
Sup 89.1 # 1
Text Classification Hyperpartisan BigBird Accuracy 92.2 # 1
Text Classification IMDb BigBird Accuracy (2 classes) 95.2 # 5
Accuracy (10 classes) - # 3
Semantic Textual Similarity MRPC BigBird F1 91.5 # 4
Natural Language Inference MultiNLI BigBird Matched 87.5 # 13
Question Answering Natural Questions BigBird-etc F1 (Long) 77.7 # 1
F1 (Short) 57.8 # 1
Text Classification Patents BigBird Accuracy 69.3 # 1
Text Summarization Pubmed BigBird-Pegasus ROUGE-1 46.32 # 2
ROUGE-2 20.65 # 1
ROUGE-L 42.33 # 2
Natural Language Inference QNLI BigBird Accuracy 92.2% # 15
Question Answering Quora Question Pairs BigBird Accuracy 88.6% # 15
Natural Language Inference RTE BigBird Accuracy 75.0% # 13
Sentiment Analysis SST-2 Binary classification BigBird Accuracy 94.6 # 19
Semantic Textual Similarity STS Benchmark BigBird Spearman Correlation .878 # 5
Question Answering TriviaQA BigBird-etc F1 80.9 # 2
Question Answering WikiHop BigBird-etc Test 82.3 # 1
Text Classification Yelp-5 BigBird Accuracy 72.16% # 3

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet