TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Dialogue Generation	CMU-DoG	∞-former (Sticky memories)	F1	9.01	# 1
Dialogue Generation	CMU-DoG	∞-former (Sticky memories)	ROUGE-1	15.37	# 1
Dialogue Generation	CMU-DoG	∞-former (Sticky memories)	Rouge-L	12.56	# 1
Dialogue Generation	CMU-DoG	∞-former (Sticky memories)	Meteor	7.55	# 1
Dialogue Generation	PG-19	∞-former (Sticky memories + initialized GPT-2 Small)	Perplexity	32.48	# 1
Language Modelling	WikiText-103	∞-former (initialized GPT-2 Small)	Test perplexity	16.64	# 17
Language Modelling	WikiText-103	∞-former (Sticky memories)	Test perplexity	24.22	# 56
Language Modelling	WikiText-103	\infty-former (Sticky memories)	Test perplexity	24.22	# 56
Language Modelling	WikiText-103	-former (SM)	Test perplexity	16.61	# 14
Language Modelling	WikiText-103	[?]-former (Sticky memories)	Test perplexity	24.22	# 56
Language Modelling	WikiText-103	[?]-former (SM)	Test perplexity	16.61	# 14
Language Modelling	WikiText-103	∞-former (Sticky memories + initialized GPT-2 Small)	Test perplexity	16.61	# 14

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/infty-former-infinite-memory-transformer/dialogue-generation-on-cmu-dog)](https://paperswithcode.com/sota/dialogue-generation-on-cmu-dog?p=infty-former-infinite-memory-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/infty-former-infinite-memory-transformer/dialogue-generation-on-pg-19)](https://paperswithcode.com/sota/dialogue-generation-on-pg-19?p=infty-former-infinite-memory-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/infty-former-infinite-memory-transformer/language-modelling-on-wikitext-103)](https://paperswithcode.com/sota/language-modelling-on-wikitext-103?p=infty-former-infinite-memory-transformer)`

$\infty$-former: Infinite Memory Transformer

1 Sep 2021 · Pedro Henrique Martins, Zita Marinho, André F. T. Martins ·

Transformers are unable to model long-term memories effectively, since the amount of computation they need to perform grows with the context length. While variations of efficient transformers have been proposed, they all have a finite memory capacity and are forced to drop old information. In this paper, we propose the $\infty$-former, which extends the vanilla transformer with an unbounded long-term memory. By making use of a continuous-space attention mechanism to attend over the long-term memory, the $\infty$-former's attention complexity becomes independent of the context length, trading off memory length with precision. In order to control where precision is more important, $\infty$-former maintains "sticky memories" being able to model arbitrarily long contexts while keeping the computation budget fixed. Experiments on a synthetic sorting task, language modeling, and document grounded dialogue generation demonstrate the $\infty$-former's ability to retain information from long sequences.

PDF Abstract

Code

Add Remove Mark official

deep-spin/infinite-former official

Tasks

Add Remove

Dialogue Generation

Language Modelling

Datasets

WikiText-2

WikiText-103 PG-19

Results from the Paper

Edit

Ranked #1 on Dialogue Generation on CMU-DoG

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Dialogue Generation	CMU-DoG	∞-former (Sticky memories)	F1	9.01	# 1	Compare
			ROUGE-1	15.37	# 1	Compare
			Rouge-L	12.56	# 1	Compare
			Meteor	7.55	# 1	Compare
Dialogue Generation	PG-19	∞-former (Sticky memories + initialized GPT-2 Small)	Perplexity	32.48	# 1	Compare
Language Modelling	WikiText-103	∞-former (initialized GPT-2 Small)	Test perplexity	16.64	# 17	Compare
Language Modelling	WikiText-103	∞-former (Sticky memories)	Test perplexity	24.22	# 56	Compare
Language Modelling	WikiText-103	\infty-former (Sticky memories)	Test perplexity	24.22	# 56	Compare
Language Modelling	WikiText-103	-former (SM)	Test perplexity	16.61	# 14	Compare
Language Modelling	WikiText-103	[?]-former (Sticky memories)	Test perplexity	24.22	# 56	Compare
Language Modelling	WikiText-103	[?]-former (SM)	Test perplexity	16.61	# 14	Compare
Language Modelling	WikiText-103	∞-former (Sticky memories + initialized GPT-2 Small)	Test perplexity	16.61	# 14	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

$\infty$-former: Infinite Memory Transformer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove