TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text Summarization	Arxiv HEP-TH citation graph	LongT5	ROUGE-1	48.35	# 9
Text Summarization	Arxiv HEP-TH citation graph	LongT5	ROUGE-2	21.92	# 2
Text Summarization	Arxiv HEP-TH citation graph	LongT5	ROUGE-L	44.27	# 5
Text Summarization	BigPatent	LongT5	ROUGE-1	76.87	# 1
Text Summarization	BigPatent	LongT5	ROUGE-2	66.06	# 1
Text Summarization	BigPatent	LongT5	ROUGE-L	70.76	# 1
Abstractive Text Summarization	CNN / Daily Mail	LongT5	ROUGE-1	43.94	# 21
Abstractive Text Summarization	CNN / Daily Mail	LongT5	ROUGE-2	21.40	# 13
Abstractive Text Summarization	CNN / Daily Mail	LongT5	ROUGE-L	41.28	# 14
Multi-Document Summarization	Multi-News	LongT5	ROUGE-2	19.43	# 2
Multi-Document Summarization	Multi-News	LongT5	ROUGE-1	48.17	# 2
Multi-Document Summarization	Multi-News	LongT5	ROUGE-SU4	24.94	# 1
Text Summarization	Pubmed	LongT5	ROUGE-1	50.23	# 3
Text Summarization	Pubmed	LongT5	ROUGE-2	24.76	# 1
Text Summarization	Pubmed	LongT5	ROUGE-L	46.67	# 1
Long-range modeling	SCROLLS	LongT5 Base	GovRep	57.7 / 30.0 / 31.4	# 5
Long-range modeling	SCROLLS	LongT5 Base	SumScr	34.8 / 9.6 / 21.1	# 7
Long-range modeling	SCROLLS	LongT5 Base	QMSum	33.9 / 11.0 / 22.8	# 5
Long-range modeling	SCROLLS	LongT5 Base	Qspr	46.6	# 6
Long-range modeling	SCROLLS	LongT5 Base	Nrtv	23.0	# 7
Long-range modeling	SCROLLS	LongT5 Base	QALT EM-T/H	37.9 / 36.6	# 4
Long-range modeling	SCROLLS	LongT5 Base	CNLI	85.6	# 7
Long-range modeling	SCROLLS	LongT5 Base	Avg.	38.6	# 5
Long-range modeling	SCROLLS	LongT5 Large	GovRep	61.3/32.2/33.8	# 11
Long-range modeling	SCROLLS	LongT5 Large	SumScr	60.3 / 31.1 / 32.8	# 1
Long-range modeling	SCROLLS	LongT5 Large	QMSum	35.1 / 12.0 / 23.3	# 1
Long-range modeling	SCROLLS	LongT5 Large	Qspr	52.3	# 3
Long-range modeling	SCROLLS	LongT5 Large	Nrtv	27.2	# 3
Long-range modeling	SCROLLS	LongT5 Large	QALT EM-T/H	40.6 / 38.6	# 3
Long-range modeling	SCROLLS	LongT5 Large	CNLI	87.3	# 4
Long-range modeling	SCROLLS	LongT5 Large	Avg.	41.03	# 3
Long-range modeling	SCROLLS	LongT5 XL	GovRep	61.1 / 32.3 / 33.7	# 1
Long-range modeling	SCROLLS	LongT5 XL	SumScr	35.8 / 9.6 / 21.1	# 3
Long-range modeling	SCROLLS	LongT5 XL	QMSum	34.9 / 11.8 / 23.5	# 3
Long-range modeling	SCROLLS	LongT5 XL	Qspr	53.1	# 2
Long-range modeling	SCROLLS	LongT5 XL	Nrtv	29.3	# 2
Long-range modeling	SCROLLS	LongT5 XL	QALT EM-T/H	46.0 / 42.1	# 1
Long-range modeling	SCROLLS	LongT5 XL	CNLI	88.2	# 3
Long-range modeling	SCROLLS	LongT5 XL	Avg.	42.53	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/longt5-efficient-text-to-text-transformer-for/text-summarization-on-bigpatent)](https://paperswithcode.com/sota/text-summarization-on-bigpatent?p=longt5-efficient-text-to-text-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/longt5-efficient-text-to-text-transformer-for/multi-document-summarization-on-multi-news)](https://paperswithcode.com/sota/multi-document-summarization-on-multi-news?p=longt5-efficient-text-to-text-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/longt5-efficient-text-to-text-transformer-for/long-range-modeling-on-scrolls)](https://paperswithcode.com/sota/long-range-modeling-on-scrolls?p=longt5-efficient-text-to-text-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/longt5-efficient-text-to-text-transformer-for/text-summarization-on-pubmed-1)](https://paperswithcode.com/sota/text-summarization-on-pubmed-1?p=longt5-efficient-text-to-text-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/longt5-efficient-text-to-text-transformer-for/text-summarization-on-arxiv)](https://paperswithcode.com/sota/text-summarization-on-arxiv?p=longt5-efficient-text-to-text-transformer-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/longt5-efficient-text-to-text-transformer-for/abstractive-text-summarization-on-cnn-daily)](https://paperswithcode.com/sota/abstractive-text-summarization-on-cnn-daily?p=longt5-efficient-text-to-text-transformer-for)`

LongT5: Efficient Text-To-Text Transformer for Long Sequences

Findings (NAACL) 2022 · Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang ·

Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models. In this paper, we present a new model, called LongT5, with which we explore the effects of scaling both the input length and model size at the same time. Specifically, we integrated attention ideas from long-input transformers (ETC), and adopted pre-training strategies from summarization pre-training (PEGASUS) into the scalable T5 architecture. The result is a new attention mechanism we call {\em Transient Global} (TGlobal), which mimics ETC's local/global attention mechanism, but without requiring additional side-inputs. We are able to achieve state-of-the-art results on several summarization tasks and outperform the original T5 models on question answering tasks.

PDF Abstract Findings (NAACL) 2022 PDF Findings (NAACL) 2022 Abstract

Code

Add Remove Mark official

google-research/longt5 official

169

Tasks

Add Remove

Abstractive Text Summarization

Long-range modeling

Multi-Document Summarization

Question Answering

Text Summarization

Datasets

Pubmed

TriviaQA

CNN/Daily Mail

Multi-News BigPatent Arxiv HEP-TH citation graph

SCROLLS arXiv Summarization Dataset

Results from the Paper

Edit

Ranked #1 on Text Summarization on BigPatent

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text Summarization	Arxiv HEP-TH citation graph	LongT5	ROUGE-1	48.35	# 9	Compare
			ROUGE-2	21.92	# 2	Compare
			ROUGE-L	44.27	# 5	Compare
Text Summarization	BigPatent	LongT5	ROUGE-1	76.87	# 1	Compare
			ROUGE-2	66.06	# 1	Compare
			ROUGE-L	70.76	# 1	Compare
Abstractive Text Summarization	CNN / Daily Mail	LongT5	ROUGE-1	43.94	# 21	Compare
			ROUGE-2	21.40	# 13	Compare
			ROUGE-L	41.28	# 14	Compare
Multi-Document Summarization	Multi-News	LongT5	ROUGE-2	19.43	# 2	Compare
			ROUGE-1	48.17	# 2	Compare
			ROUGE-SU4	24.94	# 1	Compare
Text Summarization	Pubmed	LongT5	ROUGE-1	50.23	# 3	Compare
			ROUGE-2	24.76	# 1	Compare
			ROUGE-L	46.67	# 1	Compare
Long-range modeling	SCROLLS	LongT5 Base	GovRep	57.7 / 30.0 / 31.4	# 5	Compare
			SumScr	34.8 / 9.6 / 21.1	# 7	Compare
			QMSum	33.9 / 11.0 / 22.8	# 5	Compare
			Qspr	46.6	# 6	Compare
			Nrtv	23.0	# 7	Compare
			QALT EM-T/H	37.9 / 36.6	# 4	Compare
			CNLI	85.6	# 7	Compare
			Avg.	38.6	# 5	Compare
Long-range modeling	SCROLLS	LongT5 Large	GovRep	61.3/32.2/33.8	# 11	Compare
			SumScr	60.3 / 31.1 / 32.8	# 1	Compare
			QMSum	35.1 / 12.0 / 23.3	# 1	Compare
			Qspr	52.3	# 3	Compare
			Nrtv	27.2	# 3	Compare
			QALT EM-T/H	40.6 / 38.6	# 3	Compare
			CNLI	87.3	# 4	Compare
			Avg.	41.03	# 3	Compare
Long-range modeling	SCROLLS	LongT5 XL	GovRep	61.1 / 32.3 / 33.7	# 1	Compare
			SumScr	35.8 / 9.6 / 21.1	# 3	Compare
			QMSum	34.9 / 11.8 / 23.5	# 3	Compare
			Qspr	53.1	# 2	Compare
			Nrtv	29.3	# 2	Compare
			QALT EM-T/H	46.0 / 42.1	# 1	Compare
			CNLI	88.2	# 3	Compare
			Avg.	42.53	# 2	Compare

Methods

Add Remove

Adafactor • Attention Dropout • BPE • Dense Connections • Dropout • GELU • GLU • Inverse Square Root Schedule • Layer Normalization • Linear Layer • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • SentencePiece • Softmax • T5

Edit Social Preview

LongT5: Efficient Text-To-Text Transformer for Long Sequences

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove