TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Long-range modeling	SCROLLS	BART-large SLED	GovRep	57.5 / 26.3 / 27.4	# 6
Long-range modeling	SCROLLS	BART-large SLED	SumScr	35.2 / 8.7 / 19.4	# 5
Long-range modeling	SCROLLS	BART-large SLED	QMSum	34.2 / 11.0 / 22.0	# 4
Long-range modeling	SCROLLS	BART-large SLED	Qspr	46.9	# 5
Long-range modeling	SCROLLS	BART-large SLED	Nrtv	24.1	# 6
Long-range modeling	SCROLLS	BART-large SLED	QALT EM-T/H	34.8 / 34.8	# 6
Long-range modeling	SCROLLS	BART-large SLED	CNLI	87.3	# 4
Long-range modeling	SCROLLS	BART-large SLED	Avg.	37.99	# 6

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficient-long-text-understanding-with-short/long-range-modeling-on-scrolls)](https://paperswithcode.com/sota/long-range-modeling-on-scrolls?p=efficient-long-text-understanding-with-short)`

Efficient Long-Text Understanding with Short-Text Models

1 Aug 2022 · Maor Ivgi, Uri Shaham, Jonathan Berant ·

Transformer-based pretrained language models (LMs) are ubiquitous across natural language understanding, but cannot be applied to long sequences such as stories, scientific articles and long documents, due to their quadratic complexity. While a myriad of efficient transformer variants have been proposed, they are typically based on custom implementations that require expensive pretraining from scratch. In this work, we propose SLED: SLiding-Encoder and Decoder, a simple approach for processing long sequences that re-uses and leverages battle-tested short-text pretrained LMs. Specifically, we partition the input into overlapping chunks, encode each with a short-text LM encoder and use the pretrained decoder to fuse information across chunks (fusion-in-decoder). We illustrate through controlled experiments that SLED offers a viable strategy for long text understanding and evaluate our approach on SCROLLS, a benchmark with seven datasets across a wide range of language understanding tasks. We find that SLED is competitive with specialized models that are up to 50x larger and require a dedicated and expensive pretraining step.

PDF Abstract

Code

Add Remove Mark official

mivg/sled official

Tasks

Add Remove

Long-range modeling

Natural Language Understanding

Datasets

SQuAD

HotpotQA

NarrativeQA GovReport

QuALITY

QASPER

QMSum SummScreen

SCROLLS ContractNLI

Results from the Paper

Edit

Ranked #6 on Long-range modeling on SCROLLS

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Long-range modeling	SCROLLS	BART-large SLED	GovRep	57.5 / 26.3 / 27.4	# 6	Compare
			SumScr	35.2 / 8.7 / 19.4	# 5	Compare
			QMSum	34.2 / 11.0 / 22.0	# 4	Compare
			Qspr	46.9	# 5	Compare
			Nrtv	24.1	# 6	Compare
			QALT EM-T/H	34.8 / 34.8	# 6	Compare
			CNLI	87.3	# 4	Compare
			Avg.	37.99	# 6	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Efficient Long-Text Understanding with Short-Text Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove