TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Image Generation	CIFAR-10	Logsparse (6 layers)	bits/dimension	4.253	# 71
Image Generation	ImageNet 64x64	Logsparse (6 layers)	Bits per dim	4.351	# 27

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/enhancing-the-locality-and-breaking-the/image-generation-on-imagenet-64x64)](https://paperswithcode.com/sota/image-generation-on-imagenet-64x64?p=enhancing-the-locality-and-breaking-the)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/enhancing-the-locality-and-breaking-the/image-generation-on-cifar-10)](https://paperswithcode.com/sota/image-generation-on-cifar-10?p=enhancing-the-locality-and-breaking-the)`

Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting

NeurIPS 2019 · Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, Xifeng Yan ·

Time series forecasting is an important problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation. In this paper, we propose to tackle such forecasting problem with Transformer [1]. Although impressed by its performance in our preliminary study, we found its two major weaknesses: (1) locality-agnostics: the point-wise dot-product self-attention in canonical Transformer architecture is insensitive to local context, which can make the model prone to anomalies in time series; (2) memory bottleneck: space complexity of canonical Transformer grows quadratically with sequence length $L$, making directly modeling long time series infeasible. In order to solve these two issues, we first propose convolutional self-attention by producing queries and keys with causal convolution so that local context can be better incorporated into attention mechanism. Then, we propose LogSparse Transformer with only $O(L(\log L)^{2})$ memory cost, improving forecasting accuracy for time series with fine granularity and strong long-term dependencies under constrained memory budget. Our experiments on both synthetic data and real-world datasets show that it compares favorably to the state-of-the-art.

PDF Abstract NeurIPS 2019 PDF NeurIPS 2019 Abstract

Code

Add Remove Mark official

AIStream-Peelout/flow-forecast

1,891

mlpotter/Transformer_Time_Series

500

Tasks

Add Remove

Time Series

Time Series Analysis

Time Series Forecasting

Datasets

CIFAR-10

Results from the Paper

Edit

Ranked #27 on Image Generation on ImageNet 64x64 (Bits per dim metric)

Get a GitHub badge

Results from Other Papers

Task	Dataset	Model	Metric Name	Metric Value	Rank	Source Paper	Compare
Image Generation	CIFAR-10	Logsparse (6 layers)	bits/dimension	4.253	# 71		See all
Image Generation	ImageNet 64x64	Logsparse (6 layers)	Bits per dim	4.351	# 27		See all

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Causal Convolution • Convolution • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • ReLU • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit