TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Long-range modeling	LRA	SGConv	ListOps	61.45	# 5
Long-range modeling	LRA	SGConv	Text	89.2	# 5
Long-range modeling	LRA	SGConv	Retrieval	91.11	# 5
Long-range modeling	LRA	SGConv	Image	87.97	# 8
Long-range modeling	LRA	SGConv	Pathfinder	95.46	# 4
Long-range modeling	LRA	SGConv	Avg	87.17	# 6
Long-range modeling	LRA	SGConv	Pathfinder-X	97.83	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/what-makes-convolutional-models-great-on-long/long-range-modeling-on-lra)](https://paperswithcode.com/sota/long-range-modeling-on-lra?p=what-makes-convolutional-models-great-on-long)`

What Makes Convolutional Models Great on Long Sequence Modeling?

17 Oct 2022 · Yuhong Li, Tianle Cai, Yi Zhang, Deming Chen, Debadeepta Dey ·

Convolutional models have been widely used in multiple domains. However, most existing models only use local convolution, making the model unable to handle long-range dependency efficiently. Attention overcomes this problem by aggregating global information but also makes the computational complexity quadratic to the sequence length. Recently, Gu et al. [2021] proposed a model called S4 inspired by the state space model. S4 can be efficiently implemented as a global convolutional model whose kernel size equals the input sequence length. S4 can model much longer sequences than Transformers and achieve significant gains over SoTA on several long-range tasks. Despite its empirical success, S4 is involved. It requires sophisticated parameterization and initialization schemes. As a result, S4 is less intuitive and hard to use. Here we aim to demystify S4 and extract basic principles that contribute to the success of S4 as a global convolutional model. We focus on the structure of the convolution kernel and identify two critical but intuitive principles enjoyed by S4 that are sufficient to make up an effective global convolutional model: 1) The parameterization of the convolutional kernel needs to be efficient in the sense that the number of parameters should scale sub-linearly with sequence length. 2) The kernel needs to satisfy a decaying structure that the weights for convolving with closer neighbors are larger than the more distant ones. Based on the two principles, we propose a simple yet effective convolutional model called Structured Global Convolution (SGConv). SGConv exhibits strong empirical performance over several tasks: 1) With faster speed, SGConv surpasses S4 on Long Range Arena and Speech Command datasets. 2) When plugging SGConv into standard language and vision models, it shows the potential to improve both efficiency and performance.

PDF Abstract

Code

Add Remove Mark official

ctlllll/sgconv official

157

Tasks

Add Remove

Long-range modeling

Datasets

GLUE

WikiText-2

WikiText-103

Speech Commands LRA

Results from the Paper

Edit

Ranked #6 on Long-range modeling on LRA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Long-range modeling	LRA	SGConv	ListOps	61.45	# 5	Compare
			Text	89.2	# 5	Compare
			Retrieval	91.11	# 5	Compare
			Image	87.97	# 8	Compare
			Pathfinder	95.46	# 4	Compare
			Avg	87.17	# 6	Compare
			Pathfinder-X	97.83	# 4	Compare

Methods

Add Remove

Convolution

Edit Social Preview

What Makes Convolutional Models Great on Long Sequence Modeling?

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove