Exploring the Limits of Language Modeling

7 Feb 2016Rafal Jozefowicz • Oriol Vinyals • Mike Schuster • Noam Shazeer • Yonghui Wu

In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. We perform an exhaustive study on techniques such as character Convolutional Neural Networks or Long-Short Term Memory, on the One Billion Word Benchmark.

Full paper

Evaluation


Task Dataset Model Metric name Metric value Global rank Compare
Language Modelling One Billion Word LSTM-8192-1024 + CNN Input PPL 30.0 # 8
Language Modelling One Billion Word LSTM-8192-1024 + CNN Input Number of params 1.04B # 8
Language Modelling One Billion Word LSTM-8192-1024 PPL 30.6 # 9
Language Modelling One Billion Word LSTM-8192-1024 Number of params 1.8B # 9