LongT5: Efficient Text-To-Text Transformer for Long Sequences

Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models. In this paper, we present a new model, called LongT5, with which we explore the effects of scaling both the input length and model size at the same time. Specifically, we integrated attention ideas from long-input transformers (ETC), and adopted pre-training strategies from summarization pre-training (PEGASUS) into the scalable T5 architecture. The result is a new attention mechanism we call {\em Transient Global} (TGlobal), which mimics ETC's local/global attention mechanism, but without requiring additional side-inputs. We are able to achieve state-of-the-art results on several summarization tasks and outperform the original T5 models on question answering tasks.

PDF Abstract Findings (NAACL) 2022 PDF Findings (NAACL) 2022 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Text Summarization arXiv LongT5 ROUGE-1 48.35 # 4
ROUGE-2 21.92 # 2
ROUGE-L 44.27 # 3
Text Summarization BigPatent LongT5 ROUGE-1 76.87 # 1
ROUGE-2 66.06 # 1
ROUGE-L 70.76 # 1
Abstractive Text Summarization CNN / Daily Mail LongT5 ROUGE-1 43.94 # 15
ROUGE-2 21.40 # 9
ROUGE-L 41.28 # 8
Multi-Document Summarization Multi-News LongT5 ROUGE-2 19.43 # 2
ROUGE-1 48.17 # 2
ROUGE-SU4 24.94 # 1
Text Summarization Pubmed LongT5 ROUGE-1 50.23 # 2
ROUGE-2 24.76 # 1
ROUGE-L 46.67 # 1
Long-range modeling SCROLLS LongT5 XL GovRep 54.7 / 28.2 / 30.2 # 4
SumScr 35.8 / 9.6 / 21.1 # 1
QMSum 34.9 / 11.8 / 23.5 # 2
Qspr 53.1 # 1
Nrtv 29.3 # 1
QALT EM-T/H 46.0 / 42.1 # 1
CNLI 88.2 # 2
Avg. 41.89 # 1
Long-range modeling SCROLLS LongT5 Large GovRep 54.2 / 27.8 / 29.8 # 5
SumScr 35.6 / 9.2 / 21.2 # 3
QMSum 35.1 / 12.0 / 23.3 # 1
Qspr 52.3 # 2
Nrtv 27.2 # 2
QALT EM-T/H 40.6 / 38.6 # 3
CNLI 87.3 # 3
Avg. 40.47 # 2
Long-range modeling SCROLLS LongT5 Base GovRep 53.5 / 27.3 / 29.3 # 7
SumScr 34.8 / 9.6 / 21.1 # 5
QMSum 33.9 / 11.0 / 22.8 # 3
Qspr 46.6 # 3
Nrtv 23.0 # 4
QALT EM-T/H 37.9 / 36.6 # 4
CNLI 85.6 # 4
Avg. 38.22 # 3

Methods