Long-range modeling

30 papers with code • 2 benchmarks • 4 datasets

A new task for testing the long-sequence modeling capabilities and efficiency of language models.

Image credit: SCROLLS: Standardized CompaRison Over Long Language Sequences


Use these libraries to find Long-range modeling models and implementations
2 papers

Most implemented papers

Long Range Arena: A Benchmark for Efficient Transformers

google-research/long-range-arena 8 Nov 2020

In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models.

Efficiently Modeling Long Sequences with Structured State Spaces

hazyresearch/state-spaces ICLR 2022

A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies.

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition

kenziyuliu/ms-g3d CVPR 2020

Spatial-temporal graphs have been widely used by skeleton-based action recognition algorithms to model human action dynamics.

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

hazyresearch/h3 28 Dec 2022

First, we use synthetic language modeling tasks to understand the gap between SSMs and attention.

SCROLLS: Standardized CompaRison Over Long Language Sequences

tau-nlp/scrolls 10 Jan 2022

NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild.

Diagonal State Spaces are as Effective as Structured State Spaces

hazyresearch/state-spaces 27 Mar 2022

Modeling long range dependencies in sequential data is a fundamental step towards attaining human-level performance in many modalities such as text, vision, audio and video.

On the Parameterization and Initialization of Diagonal State Space Models

hazyresearch/state-spaces 23 Jun 2022

On the other hand, a recent variant of S4 called DSS showed that restricting the state matrix to be fully diagonal can still preserve the performance of the original model when using a specific initialization based on approximating S4's matrix.

Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration

164140757/scm 21 Jul 2022

Weakly Supervised Object Localization (WSOL), which aims to localize objects by only using image-level labels, has attracted much attention because of its low annotation cost in real applications.

Mega: Moving Average Equipped Gated Attention

facebookresearch/mega 21 Sep 2022

The design choices in the Transformer attention mechanism, including weak inductive bias and quadratic computational complexity, have limited its application for modeling long sequences.

T-former: An Efficient Transformer for Image Inpainting

dengyecode/t-former_image_inpainting 12 May 2023

And based on this attention, a network called $T$-former is designed for image inpainting.