no code implementations • 23 Oct 2018 • Mostofa Patwary, Milind Chabbi, Heewoo Jun, Jiaji Huang, Gregory Diamos, Kenneth Church
We show how Zipf's Law can be used to scale up language modeling (LM) to take advantage of more training data and more GPUs.