Adaptive Input Representations for Neural Language Modeling

We introduce adaptive input representations for neural language modeling which extend the adaptive softmax of Grave et al. (2017) to input representations of variable capacity. There are several choices on how to factorize the input and output layers, and whether to model words, characters or sub-word units... (read more)

PDF Abstract ICLR 2019 PDF ICLR 2019 Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Language Modelling One Billion Word Adaptive Input Large PPL 23.91 # 4
Number of params 0.46B # 1
Validation perplexity 23.83 # 2
Language Modelling One Billion Word Adaptive Input Very Large PPL 23.02 # 2
Number of params 1.0B # 1
Validation perplexity 22.92 # 1
Language Modelling WikiText-103 Transformer (Adaptive inputs) Validation perplexity 17.97 # 6
Test perplexity 18.70 # 13
Number of params 247M # 7

Methods used in the Paper