TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Long-range modeling	SCROLLS	LED Base	GovRep	56.2 / 26.6 / 28.8	# 7
Long-range modeling	SCROLLS	LED Base	SumScr	24.2 / 4.5 / 15.4	# 10
Long-range modeling	SCROLLS	LED Base	QMSum	25.1 / 6.7 / 18.8	# 10
Long-range modeling	SCROLLS	LED Base	Qspr	26.6	# 8
Long-range modeling	SCROLLS	LED Base	Nrtv	18.5	# 8
Long-range modeling	SCROLLS	LED Base	QALT EM-T/H	25.8 / 25.4	# 8
Long-range modeling	SCROLLS	LED Base	CNLI	71.5	# 9
Long-range modeling	SCROLLS	LED Base	Avg.	29.16	# 8
Long-range modeling	SCROLLS	BART Base	GovRep	47.9 / 18.6 / 22.7	# 9
Long-range modeling	SCROLLS	BART Base	SumScr	27.2 / 4.9 / 16.7	# 9
Long-range modeling	SCROLLS	BART Base	QMSum	30.2 / 8.7 / 20.7	# 9
Long-range modeling	SCROLLS	BART Base	Qspr	26.3	# 9
Long-range modeling	SCROLLS	BART Base	Nrtv	15.4	# 9
Long-range modeling	SCROLLS	BART Base	QALT EM-T/H	26.0 / 25.9	# 7
Long-range modeling	SCROLLS	BART Base	CNLI	77.4	# 8
Long-range modeling	SCROLLS	BART Base	Avg.	29.01	# 9
Long-range modeling	SCROLLS	Naive	GovRep	45.3 / 17.9 / 20.8	# 10
Long-range modeling	SCROLLS	Naive	SumScr	19.6 / 1.8 / 11.0	# 11
Long-range modeling	SCROLLS	Naive	QMSum	14.2 / 2.0 / 9.3	# 11
Long-range modeling	SCROLLS	Naive	Qspr	3.4	# 10
Long-range modeling	SCROLLS	Naive	Nrtv	1.5	# 10
Long-range modeling	SCROLLS	Naive	QALT EM-T/H	25.2 / 26.1	# 9
Long-range modeling	SCROLLS	Naive	CNLI	66	# 10
Long-range modeling	SCROLLS	Naive	Avg.	19.35	# 10

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/scrolls-standardized-comparison-over-long/long-range-modeling-on-scrolls)](https://paperswithcode.com/sota/long-range-modeling-on-scrolls?p=scrolls-standardized-comparison-over-long)`

SCROLLS: Standardized CompaRison Over Long Language Sequences

10 Jan 2022 · Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, Omer Levy ·

NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild. We introduce SCROLLS, a suite of tasks that require reasoning over long texts. We examine existing long-text datasets, and handpick ones where the text is naturally long, while prioritizing tasks that involve synthesizing information across the input. SCROLLS contains summarization, question answering, and natural language inference tasks, covering multiple domains, including literature, science, business, and entertainment. Initial baselines, including Longformer Encoder-Decoder, indicate that there is ample room for improvement on SCROLLS. We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.

PDF Abstract

Code

Add Remove Mark official

tau-nlp/scrolls official

mivg/sled

Tasks

Add Remove

Long-range modeling

Natural Language Inference

Question Answering

Datasets

Introduced in the Paper:

SCROLLS

Used in the Paper:

SQuAD

Natural Questions

NarrativeQA LRA GovReport

QuALITY

QASPER SummScreen ContractNLI

Results from the Paper

Edit

Ranked #8 on Long-range modeling on SCROLLS

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Long-range modeling	SCROLLS	LED Base	GovRep	56.2 / 26.6 / 28.8	# 7	Compare
			SumScr	24.2 / 4.5 / 15.4	# 10	Compare
			QMSum	25.1 / 6.7 / 18.8	# 10	Compare
			Qspr	26.6	# 8	Compare
			Nrtv	18.5	# 8	Compare
			QALT EM-T/H	25.8 / 25.4	# 8	Compare
			CNLI	71.5	# 9	Compare
			Avg.	29.16	# 8	Compare
Long-range modeling	SCROLLS	BART Base	GovRep	47.9 / 18.6 / 22.7	# 9	Compare
			SumScr	27.2 / 4.9 / 16.7	# 9	Compare
			QMSum	30.2 / 8.7 / 20.7	# 9	Compare
			Qspr	26.3	# 9	Compare
			Nrtv	15.4	# 9	Compare
			QALT EM-T/H	26.0 / 25.9	# 7	Compare
			CNLI	77.4	# 8	Compare
			Avg.	29.01	# 9	Compare
Long-range modeling	SCROLLS	Naive	GovRep	45.3 / 17.9 / 20.8	# 10	Compare
			SumScr	19.6 / 1.8 / 11.0	# 11	Compare
			QMSum	14.2 / 2.0 / 9.3	# 11	Compare
			Qspr	3.4	# 10	Compare
			Nrtv	1.5	# 10	Compare
			QALT EM-T/H	25.2 / 26.1	# 9	Compare
			CNLI	66	# 10	Compare
			Avg.	19.35	# 10	Compare

Methods

Add Remove

AdamW • Attention Dropout • Dense Connections • Dilated Sliding Window Attention • Dropout • GELU • Global and Sliding Window Attention • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Longformer • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Sliding Window Attention • Softmax • Weight Decay • WordPiece

Edit Social Preview

SCROLLS: Standardized CompaRison Over Long Language Sequences

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove