Direct Output Connection for a High-Rank Language Model

EMNLP 2018  ·  Sho Takase, Jun Suzuki, Masaaki Nagata ·

This paper proposes a state-of-the-art recurrent neural network (RNN) language model that combines probability distributions computed not only from a final RNN layer but also from middle layers. Our proposed method raises the expressive power of a language model based on the matrix factorization interpretation of language modeling introduced by Yang et al. (2018). The proposed method improves the current state-of-the-art language model and achieves the best score on the Penn Treebank and WikiText-2, which are the standard benchmark datasets. Moreover, we indicate our proposed method contributes to two application tasks: machine translation and headline generation. Our code is publicly available at: https://github.com/nttcslab-nlp/doc_lm.

PDF Abstract EMNLP 2018 PDF EMNLP 2018 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Constituency Parsing Penn Treebank LSTM Encoder-Decoder + LSTM-LM F1 score 94.47 # 15
Language Modelling Penn Treebank (Word Level) AWD-LSTM-DOC x5 Validation perplexity 48.63 # 8
Test perplexity 47.17 # 8
Params 185M # 4
Language Modelling Penn Treebank (Word Level) AWD-LSTM-DOC Validation perplexity 54.12 # 14
Test perplexity 52.38 # 16
Params 23M # 19
Language Modelling WikiText-2 AWD-LSTM-DOC Validation perplexity 60.29 # 17
Test perplexity 58.03 # 24
Number of params 37M # 9
Language Modelling WikiText-2 AWD-LSTM-DOC x5 Validation perplexity 54.19 # 12
Test perplexity 53.09 # 20
Number of params 185M # 6

Methods


No methods listed for this paper. Add relevant methods here