TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Language Modelling	WikiText-103	kNN-LM w/ Continuous Cache	Validation perplexity	15.81	# 4
Language Modelling	WikiText-103	kNN-LM w/ Continuous Cache	Test perplexity	15.79	# 10
Language Modelling	WikiText-103	kNN-LM w/ Continuous Cache	Number of params	247M	# 19
Language Modelling	WikiText-103	kNN-LM	Validation perplexity	16.06	# 6
Language Modelling	WikiText-103	kNN-LM	Test perplexity	16.12	# 12
Language Modelling	WikiText-103	kNN-LM	Number of params	247M	# 19

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/generalization-through-memorization-nearest/language-modelling-on-wikitext-103)](https://paperswithcode.com/sota/language-modelling-on-wikitext-103?p=generalization-through-memorization-nearest)`

Generalization through Memorization: Nearest Neighbor Language Models

ICLR 2020 · Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, Mike Lewis ·

We introduce $k$NN-LMs, which extend a pre-trained neural language model (LM) by linearly interpolating it with a $k$-nearest neighbors ($k$NN) model. The nearest neighbors are computed according to distance in the pre-trained LM embedding space, and can be drawn from any text collection, including the original LM training data. Applying this augmentation to a strong Wikitext-103 LM, with neighbors drawn from the original training set, our $k$NN-LM achieves a new state-of-the-art perplexity of 15.79 - a 2.9 point improvement with no additional training. We also show that this approach has implications for efficiently scaling up to larger training sets and allows for effective domain adaptation, by simply varying the nearest neighbor datastore, again without further training. Qualitatively, the model is particularly helpful in predicting rare patterns, such as factual knowledge. Together, these results strongly suggest that learning similarity between sequences of text is easier than predicting the next word, and that nearest neighbor search is an effective approach for language modeling in the long tail.

PDF Abstract ICLR 2020 PDF ICLR 2020 Abstract

Code

Add Remove Mark official

urvashik/knnlm official

301

labmlai/annotated_deep_learning_pap…

↳ View annotated code at

labml.ai

47,906

neulab/knn-transformers

262

cordercorder/knn-models

MS-Mind/MS-Code-06

Tasks

Add Remove

Domain Adaptation

Language Modelling

Memorization

Datasets

WikiText-2

WikiText-103

Results from the Paper

Edit

Ranked #10 on Language Modelling on WikiText-103

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Language Modelling	WikiText-103	kNN-LM w/ Continuous Cache	Validation perplexity	15.81	# 4	Compare
			Test perplexity	15.79	# 10	Compare
			Number of params	247M	# 19	Compare
Language Modelling	WikiText-103	kNN-LM	Validation perplexity	16.06	# 6	Compare
			Test perplexity	16.12	# 12	Compare
			Number of params	247M	# 19	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Generalization through Memorization: Nearest Neighbor Language Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove