30 papers with code • 2 benchmarks • 4 datasets
A new task for testing the long-sequence modeling capabilities and efficiency of language models.
Image credit: SCROLLS: Standardized CompaRison Over Long Language Sequences
LibrariesUse these libraries to find Long-range modeling models and implementations
Most implemented papers
Long Range Arena: A Benchmark for Efficient Transformers
In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models.
Efficiently Modeling Long Sequences with Structured State Spaces
A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies.
Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition
Spatial-temporal graphs have been widely used by skeleton-based action recognition algorithms to model human action dynamics.
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
First, we use synthetic language modeling tasks to understand the gap between SSMs and attention.
SCROLLS: Standardized CompaRison Over Long Language Sequences
NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild.
Diagonal State Spaces are as Effective as Structured State Spaces
Modeling long range dependencies in sequential data is a fundamental step towards attaining human-level performance in many modalities such as text, vision, audio and video.
On the Parameterization and Initialization of Diagonal State Space Models
On the other hand, a recent variant of S4 called DSS showed that restricting the state matrix to be fully diagonal can still preserve the performance of the original model when using a specific initialization based on approximating S4's matrix.
Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration
Weakly Supervised Object Localization (WSOL), which aims to localize objects by only using image-level labels, has attracted much attention because of its low annotation cost in real applications.
Mega: Moving Average Equipped Gated Attention
The design choices in the Transformer attention mechanism, including weak inductive bias and quadratic computational complexity, have limited its application for modeling long sequences.
T-former: An Efficient Transformer for Image Inpainting
And based on this attention, a network called $T$-former is designed for image inpainting.