An Analysis of Neural Language Modeling at Multiple Scales

22 Mar 2018Stephen MerityNitish Shirish KeskarRichard Socher

Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and extend them to both larger vocabularies as well as character-level granularity... (read more)

PDF Abstract

Evaluation results from the paper


Task Dataset Model Metric name Metric value Global rank Compare
Language Modelling Hutter Prize 3-layer AWD-LSTM Bit per Character (BPC) 1.232 # 6
Language Modelling Hutter Prize 3-layer AWD-LSTM Number of params 47M # 1
Language Modelling Penn Treebank (Character Level) 3-layer AWD-LSTM Bit per Character (BPC) 1.175 # 3
Language Modelling Penn Treebank (Character Level) 3-layer AWD-LSTM Number of params 13.8M # 1
Language Modelling Penn Treebank (Character Level) 6-layer QRNN Bit per Character (BPC) 1.187 # 4
Language Modelling Penn Treebank (Character Level) 6-layer QRNN Number of params 13.8M # 1
Language Modelling WikiText-103 4-layer QRNN Validation perplexity 32.0 # 5
Language Modelling WikiText-103 4-layer QRNN Test perplexity 33.0 # 7
Language Modelling WikiText-103 4-layer QRNN Number of params 151M # 1