LongT5: Efficient Text-To-Text Transformer for Long Sequences
Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models. In this paper, we present a new model, called LongT5, with which we explore the effects of scaling both the input length and model size at the same time. Specifically, we integrated attention ideas from long-input transformers (ETC), and adopted pre-training strategies from summarization pre-training (PEGASUS) into the scalable T5 architecture. The result is a new attention mechanism we call {\em Transient Global} (TGlobal), which mimics ETC's local/global attention mechanism, but without requiring additional side-inputs. We are able to achieve state-of-the-art results on several summarization tasks and outperform the original T5 models on question answering tasks.
PDF Abstract Findings (NAACL) 2022 PDF Findings (NAACL) 2022 AbstractCode
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Text Summarization | arXiv | LongT5 | ROUGE-1 | 48.35 | # 9 | |
ROUGE-2 | 21.92 | # 2 | ||||
ROUGE-L | 44.27 | # 5 | ||||
Text Summarization | BigPatent | LongT5 | ROUGE-1 | 76.87 | # 1 | |
ROUGE-2 | 66.06 | # 1 | ||||
ROUGE-L | 70.76 | # 1 | ||||
Abstractive Text Summarization | CNN / Daily Mail | LongT5 | ROUGE-1 | 43.94 | # 21 | |
ROUGE-2 | 21.40 | # 13 | ||||
ROUGE-L | 41.28 | # 14 | ||||
Multi-Document Summarization | Multi-News | LongT5 | ROUGE-2 | 19.43 | # 2 | |
ROUGE-1 | 48.17 | # 2 | ||||
ROUGE-SU4 | 24.94 | # 1 | ||||
Text Summarization | Pubmed | LongT5 | ROUGE-1 | 50.23 | # 3 | |
ROUGE-2 | 24.76 | # 1 | ||||
ROUGE-L | 46.67 | # 1 | ||||
Long-range modeling | SCROLLS | LongT5 XL | GovRep | 61.1 / 32.3 / 33.7 | # 1 | |
SumScr | 35.8 / 9.6 / 21.1 | # 3 | ||||
QMSum | 34.9 / 11.8 / 23.5 | # 3 | ||||
Qspr | 53.1 | # 2 | ||||
Nrtv | 29.3 | # 2 | ||||
QALT EM-T/H | 46.0 / 42.1 | # 1 | ||||
CNLI | 88.2 | # 3 | ||||
Avg. | 42.53 | # 2 | ||||
Long-range modeling | SCROLLS | LongT5 Base | GovRep | 57.7 / 30.0 / 31.4 | # 5 | |
SumScr | 34.8 / 9.6 / 21.1 | # 7 | ||||
QMSum | 33.9 / 11.0 / 22.8 | # 5 | ||||
Qspr | 46.6 | # 6 | ||||
Nrtv | 23.0 | # 7 | ||||
QALT EM-T/H | 37.9 / 36.6 | # 4 | ||||
CNLI | 85.6 | # 7 | ||||
Avg. | 38.6 | # 5 | ||||
Long-range modeling | SCROLLS | LongT5 Large | GovRep | 61.3/32.2/33.8 | # 11 | |
SumScr | 60.3 / 31.1 / 32.8 | # 1 | ||||
QMSum | 35.1 / 12.0 / 23.3 | # 1 | ||||
Qspr | 52.3 | # 3 | ||||
Nrtv | 27.2 | # 3 | ||||
QALT EM-T/H | 40.6 / 38.6 | # 3 | ||||
CNLI | 87.3 | # 4 | ||||
Avg. | 41.03 | # 3 |