An Analysis of Neural Language Modeling at Multiple Scales

22 Mar 2018Stephen Merity • Nitish Shirish Keskar • Richard Socher

Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and extend them to both larger vocabularies as well as character-level granularity. When properly tuned, LSTMs and QRNNs achieve state-of-the-art results on character-level (Penn Treebank, enwik8) and word-level (WikiText-103) datasets, respectively.

Full paper

Evaluation


Task Dataset Model Metric name Metric value Global rank Compare
Language Modelling Hutter Prize 3-layer AWD-LSTM Bit per Character (BPC) 1.232 # 7
Language Modelling Hutter Prize 3-layer AWD-LSTM Number of params 47M # 7
Language Modelling Penn Treebank (Character Level) 3-layer AWD-LSTM Bit per Character (BPC) 1.175 # 2
Language Modelling Penn Treebank (Character Level) 3-layer AWD-LSTM Number of params 13.8M # 2
Language Modelling Penn Treebank (Character Level) 6-layer QRNN Bit per Character (BPC) 1.187 # 3
Language Modelling Penn Treebank (Character Level) 6-layer QRNN Number of params 13.8M # 3
Language Modelling WikiText-103 4-layer QRNN Validation perplexity 32.0 # 6
Language Modelling WikiText-103 4-layer QRNN Test perplexity 33.0 # 6
Language Modelling WikiText-103 4-layer QRNN Number of params 151M # 6