TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Language Modelling	Penn Treebank (Word Level)	AWD-LSTM-DRILL + dynamic eval	Validation perplexity	49.5	# 10
Language Modelling	Penn Treebank (Word Level)	AWD-LSTM-DRILL + dynamic eval	Test perplexity	49.4	# 12
Language Modelling	Penn Treebank (Word Level)	AWD-LSTM-DRILL + dynamic eval	Params	24M	# 7
Language Modelling	Penn Treebank (Word Level)	AWD-LSTM-DRILL	Validation perplexity	58.2	# 21
Language Modelling	Penn Treebank (Word Level)	AWD-LSTM-DRILL	Test perplexity	55.7	# 25
Language Modelling	Penn Treebank (Word Level)	AWD-LSTM-DRILL	Params	24M	# 7
Language Modelling	WikiText-2	AWD-LSTM-DRILL	Validation perplexity	64.9	# 20
Language Modelling	WikiText-2	AWD-LSTM-DRILL	Test perplexity	61.9	# 28
Language Modelling	WikiText-2	AWD-LSTM-DRILL	Number of params	34M	# 20
Language Modelling	WikiText-2	AWD-LSTM-DRILL + dynamic eval	Validation perplexity	43.9	# 9
Language Modelling	WikiText-2	AWD-LSTM-DRILL + dynamic eval	Test perplexity	42.0	# 17
Language Modelling	WikiText-2	AWD-LSTM-DRILL + dynamic eval	Number of params	34M	# 20
Machine Translation	WMT2014 English-German	Transformer-DRILL Base	BLEU score	28.1	# 49
Machine Translation	WMT2014 English-German	Transformer-DRILL Base	Hardware Burden	None	# 1
Machine Translation	WMT2014 English-German	Transformer-DRILL Base	Operations per network pass	None	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-residual-output-layers-for-neural/language-modelling-on-penn-treebank-word)](https://paperswithcode.com/sota/language-modelling-on-penn-treebank-word?p=deep-residual-output-layers-for-neural)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-residual-output-layers-for-neural/language-modelling-on-wikitext-2)](https://paperswithcode.com/sota/language-modelling-on-wikitext-2?p=deep-residual-output-layers-for-neural)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-residual-output-layers-for-neural/machine-translation-on-wmt2014-english-german)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-german?p=deep-residual-output-layers-for-neural)`

Deep Residual Output Layers for Neural Language Generation

14 May 2019 · Nikolaos Pappas, James Henderson ·

Many tasks, including language generation, benefit from learning the structure of the output space, particularly when the space of output labels is large and the data is sparse. State-of-the-art neural language models indirectly capture the output space structure in their classifier weights since they lack parameter sharing across output labels. Learning shared output label mappings helps, but existing methods have limited expressivity and are prone to overfitting. In this paper, we investigate the usefulness of more powerful shared mappings for output labels, and propose a deep residual output mapping with dropout between layers to better capture the structure of the output space and avoid overfitting. Evaluations on three language generation tasks show that our output label mapping can match or improve state-of-the-art recurrent and self-attention architectures, and suggest that the classifier does not necessarily need to be high-rank to better model natural language if it is better at capturing the structure of the output space.

PDF Abstract

Code

Add Remove Mark official

idiap/drill official

Tasks

Add Remove

Language Modelling

Machine Translation

Text Generation

Datasets

Penn Treebank

WikiText-2

WMT 2014

Results from the Paper

Edit

Ranked #12 on Language Modelling on Penn Treebank (Word Level)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Language Modelling	Penn Treebank (Word Level)	AWD-LSTM-DRILL + dynamic eval	Validation perplexity	49.5	# 10	Compare
			Test perplexity	49.4	# 12	Compare
			Params	24M	# 7	Compare
Language Modelling	Penn Treebank (Word Level)	AWD-LSTM-DRILL	Validation perplexity	58.2	# 21	Compare
			Test perplexity	55.7	# 25	Compare
			Params	24M	# 7	Compare
Language Modelling	WikiText-2	AWD-LSTM-DRILL	Validation perplexity	64.9	# 20	Compare
			Test perplexity	61.9	# 28	Compare
			Number of params	34M	# 20	Compare
Language Modelling	WikiText-2	AWD-LSTM-DRILL + dynamic eval	Validation perplexity	43.9	# 9	Compare
			Test perplexity	42.0	# 17	Compare
			Number of params	34M	# 20	Compare
Machine Translation	WMT2014 English-German	Transformer-DRILL Base	BLEU score	28.1	# 49	Compare
			Hardware Burden	None	# 1	Compare
			Operations per network pass	None	# 1	Compare

Methods

Add Remove

Dropout

Edit Social Preview

Deep Residual Output Layers for Neural Language Generation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove