TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Language Modelling	enwik8	Recurrent Highway Networks	Bit per Character (BPC)	1.27	# 36
Language Modelling	enwik8	Recurrent Highway Networks	Number of params	46M	# 24
Language Modelling	Hutter Prize	Large RHN	Bit per Character (BPC)	1.27	# 16
Language Modelling	Hutter Prize	Large RHN	Number of params	46M	# 10
Language Modelling	Hutter Prize	RHN - depth 5 [zilly2016recurrent]	Bit per Character (BPC)	1.31	# 18
Language Modelling	Penn Treebank (Word Level)	Recurrent highway networks	Validation perplexity	67.9	# 26
Language Modelling	Penn Treebank (Word Level)	Recurrent highway networks	Test perplexity	65.4	# 33
Language Modelling	Penn Treebank (Word Level)	Recurrent highway networks	Params	23M	# 19
Language Modelling	Text8	Large RHN	Bit per Character (BPC)	1.27	# 17
Language Modelling	Text8	Large RHN	Number of params	46M	# 7

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/recurrent-highway-networks/language-modelling-on-hutter-prize)](https://paperswithcode.com/sota/language-modelling-on-hutter-prize?p=recurrent-highway-networks)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/recurrent-highway-networks/language-modelling-on-text8)](https://paperswithcode.com/sota/language-modelling-on-text8?p=recurrent-highway-networks)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/recurrent-highway-networks/language-modelling-on-penn-treebank-word)](https://paperswithcode.com/sota/language-modelling-on-penn-treebank-word?p=recurrent-highway-networks)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/recurrent-highway-networks/language-modelling-on-enwiki8)](https://paperswithcode.com/sota/language-modelling-on-enwiki8?p=recurrent-highway-networks)`

Recurrent Highway Networks

ICML 2017 · Julian Georg Zilly, Rupesh Kumar Srivastava, Jan Koutník, Jürgen Schmidhuber ·

Many sequential processing tasks require complex nonlinear transition functions from one step to the next. However, recurrent neural networks with 'deep' transition functions remain difficult to train, even when using Long Short-Term Memory (LSTM) networks. We introduce a novel theoretical analysis of recurrent networks based on Gersgorin's circle theorem that illuminates several modeling and optimization issues and improves our understanding of the LSTM cell. Based on this analysis we propose Recurrent Highway Networks, which extend the LSTM architecture to allow step-to-step transition depths larger than one. Several language modeling experiments demonstrate that the proposed architecture results in powerful and efficient models. On the Penn Treebank corpus, solely increasing the transition depth from 1 to 10 improves word-level perplexity from 90.6 to 65.4 using the same number of parameters. On the larger Wikipedia datasets for character prediction (text8 and enwik8), RHNs outperform all previous results and achieve an entropy of 1.27 bits per character.

PDF Abstract ICML 2017 PDF ICML 2017 Abstract

Code

Add Remove Mark official

julian121266/RecurrentHighwayNetwor… official

402

labmlai/annotated_deep_learning_pap…

↳ View annotated code at

labml.ai

48,779

jzilly/RecurrentHighwayNetworks

402

vermaMachineLearning/Pytorch-JIT-Re…

davidsvaughn/dts-tf

Tasks

Add Remove

Language Modelling

Datasets

Penn Treebank

JSB Chorales Text8 Hutter Prize

Results from the Paper

Edit

Ranked #16 on Language Modelling on Hutter Prize

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Language Modelling	enwik8	Recurrent Highway Networks	Bit per Character (BPC)	1.27	# 36	Compare
Language Modelling	enwik8	Recurrent Highway Networks	Number of params	46M	# 24	Compare
Language Modelling	Hutter Prize	Large RHN	Bit per Character (BPC)	1.27	# 16	Compare
Language Modelling	Hutter Prize	Large RHN	Number of params	46M	# 10	Compare
Language Modelling	Hutter Prize	RHN - depth 5 [zilly2016recurrent]	Bit per Character (BPC)	1.31	# 18	Compare
Language Modelling	Penn Treebank (Word Level)	Recurrent highway networks	Validation perplexity	67.9	# 26	Compare
			Test perplexity	65.4	# 33	Compare
			Params	23M	# 19	Compare
Language Modelling	Text8	Large RHN	Bit per Character (BPC)	1.27	# 17	Compare
Language Modelling	Text8	Large RHN	Number of params	46M	# 7	Compare

Methods

Add Remove

LSTM • Sigmoid Activation • Tanh Activation

Edit Social Preview

Recurrent Highway Networks

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove