An Analysis of Neural Language Modeling at Multiple Scales

22 Mar 2018  ·  Stephen Merity, Nitish Shirish Keskar, Richard Socher ·

Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and extend them to both larger vocabularies as well as character-level granularity. When properly tuned, LSTMs and QRNNs achieve state-of-the-art results on character-level (Penn Treebank, enwik8) and word-level (WikiText-103) datasets, respectively. Results are obtained in only 12 hours (WikiText-103) to 2 days (enwik8) using a single modern GPU.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Language Modelling enwik8 AWD-LSTM (3 layers) Bit per Character (BPC) 1.232 # 33
Number of params 47M # 22
Language Modelling Hutter Prize 3-layer AWD-LSTM Bit per Character (BPC) 1.232 # 13
Number of params 47M # 8
Language Modelling Penn Treebank (Character Level) 3-layer AWD-LSTM Bit per Character (BPC) 1.175 # 7
Number of params 13.8M # 8
Language Modelling Penn Treebank (Character Level) 6-layer QRNN Bit per Character (BPC) 1.187 # 9
Number of params 13.8M # 8
Language Modelling WikiText-103 4 layer QRNN Validation perplexity 32.0 # 32
Test perplexity 33.0 # 75
Number of params 151M # 29

Methods


No methods listed for this paper. Add relevant methods here