TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Language Modelling	Penn Treebank (Word Level)	AWD-LSTM-MoS + dynamic eval	Validation perplexity	48.33	# 7
Language Modelling	Penn Treebank (Word Level)	AWD-LSTM-MoS + dynamic eval	Test perplexity	47.69	# 11
Language Modelling	Penn Treebank (Word Level)	AWD-LSTM-MoS + dynamic eval	Params	22M	# 23
Language Modelling	Penn Treebank (Word Level)	AWD-LSTM-MoS	Validation perplexity	56.54	# 16
Language Modelling	Penn Treebank (Word Level)	AWD-LSTM-MoS	Test perplexity	54.44	# 20
Language Modelling	Penn Treebank (Word Level)	AWD-LSTM-MoS	Params	22M	# 23
Language Modelling	WikiText-2	AWD-LSTM-MoS	Validation perplexity	63.88	# 19
Language Modelling	WikiText-2	AWD-LSTM-MoS	Test perplexity	61.45	# 26
Language Modelling	WikiText-2	AWD-LSTM-MoS	Number of params	35M	# 12
Language Modelling	WikiText-2	AWD-LSTM-MoS + dynamic eval	Validation perplexity	42.41	# 8
Language Modelling	WikiText-2	AWD-LSTM-MoS + dynamic eval	Test perplexity	40.68	# 16
Language Modelling	WikiText-2	AWD-LSTM-MoS + dynamic eval	Number of params	35M	# 12

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/breaking-the-softmax-bottleneck-a-high-rank/language-modelling-on-penn-treebank-word)](https://paperswithcode.com/sota/language-modelling-on-penn-treebank-word?p=breaking-the-softmax-bottleneck-a-high-rank)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/breaking-the-softmax-bottleneck-a-high-rank/language-modelling-on-wikitext-2)](https://paperswithcode.com/sota/language-modelling-on-wikitext-2?p=breaking-the-softmax-bottleneck-a-high-rank)`

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

ICLR 2018 · Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, William W. Cohen ·

We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. We propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively. The proposed method also excels on the large-scale 1B Word dataset, outperforming the baseline by over 5.6 points in perplexity.

PDF Abstract ICLR 2018 PDF ICLR 2018 Abstract

Code

Add Remove Mark official

zihangdai/mos official

391

cstorm125/thai2fit

188

tdmeeste/SparseSeqModels

nkcr/overlap-ml

yfreedomliTHU/mos-pytorch1.1

See all 9 implementations

Tasks

Add Remove

Language Modelling

Vocal Bursts Intensity Prediction

Word Embeddings

Datasets

Penn Treebank

WikiText-2

Results from the Paper

Edit

Ranked #11 on Language Modelling on Penn Treebank (Word Level)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Language Modelling	Penn Treebank (Word Level)	AWD-LSTM-MoS + dynamic eval	Validation perplexity	48.33	# 7	Compare
			Test perplexity	47.69	# 11	Compare
			Params	22M	# 23	Compare
Language Modelling	Penn Treebank (Word Level)	AWD-LSTM-MoS	Validation perplexity	56.54	# 16	Compare
			Test perplexity	54.44	# 20	Compare
			Params	22M	# 23	Compare
Language Modelling	WikiText-2	AWD-LSTM-MoS	Validation perplexity	63.88	# 19	Compare
			Test perplexity	61.45	# 26	Compare
			Number of params	35M	# 12	Compare
Language Modelling	WikiText-2	AWD-LSTM-MoS + dynamic eval	Validation perplexity	42.41	# 8	Compare
			Test perplexity	40.68	# 16	Compare
			Number of params	35M	# 12	Compare

Methods

Add Remove

Activation Regularization • AWD-LSTM • DropConnect • Dropout • Embedding Dropout • LSTM • Mixture of Softmaxes • Sigmoid Activation • Softmax • Tanh Activation • Temporal Activation Regularization • Variational Dropout • Weight Tying

Edit Social Preview

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove