Mogrifier LSTM

ICLR 2020  ·  Gábor Melis, Tomáš Kočiský, Phil Blunsom ·

Many advances in Natural Language Processing have been based upon more expressive models for how inputs interact with the context in which they occur. Recurrent networks, which have enjoyed a modicum of success, still lack the generalization and systematicity ultimately required for modelling language. In this work, we propose an extension to the venerable Long Short-Term Memory in the form of mutual gating of the current input and the previous output. This mechanism affords the modelling of a richer space of interactions between inputs and their context. Equivalently, our model can be viewed as making the transition function given by the LSTM context-dependent. Experiments demonstrate markedly improved generalization on language modelling in the range of 3-4 perplexity points on Penn Treebank and Wikitext-2, and 0.01-0.05 bpc on four character-based datasets. We establish a new state of the art on all datasets with the exception of Enwik8, where we close a large gap between the LSTM and Transformer models.

PDF Abstract ICLR 2020 PDF ICLR 2020 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Language Modelling enwik8 LSTM Bit per Character (BPC) 1.195 # 31
Number of params 48M # 20
Language Modelling enwik8 Mogrifier LSTM Bit per Character (BPC) 1.146 # 30
Number of params 48M # 20
Language Modelling Hutter Prize Mogrifier LSTM Bit per Character (BPC) 1.122 # 12
Number of params 96M # 5
Language Modelling Hutter Prize Mogrifier LSTM + dynamic eval Bit per Character (BPC) 0.988 # 3
Number of params 96M # 5
Language Modelling Penn Treebank (Character Level) Mogrifier LSTM + dynamic eval Bit per Character (BPC) 1.083 # 1
Number of params 24M # 3
Language Modelling Penn Treebank (Character Level) Mogrifier LSTM Bit per Character (BPC) 1.120 # 2
Number of params 24M # 3
Language Modelling Penn Treebank (Word Level) Mogrifier LSTM + dynamic eval Validation perplexity 44.8 # 2
Test perplexity 44.9 # 4
Params 24M # 7
Language Modelling WikiText-2 Mogrifier LSTM Validation perplexity 57.3 # 15
Test perplexity 55.1 # 22
Number of params 35M # 12
Language Modelling WikiText-2 Mogrifier LSTM + dynamic eval Validation perplexity 40.2 # 3
Test perplexity 38.6 # 11
Number of params 35M # 12

Methods