Dynamic Evaluation of Transformer Language Models

17 Apr 2019Ben KrauseEmmanuel KahembweIain MurraySteve Renals

This research note combines two methods that have recently improved the state of the art in language modeling: Transformers and dynamic evaluation. Transformers use stacked layers of self-attention that allow them to capture long range dependencies in sequential data... (read more)

PDF Abstract

Evaluation results from the paper


Task Dataset Model Metric name Metric value Global rank Compare
Language Modelling enwiki8 Transformer-XL + RMS dynamic eval + decay Bit per Character (BPC) 0.940 # 2
Language Modelling enwiki8 Transformer-XL + RMS dynamic eval + decay Number of params 277M # 1
Language Modelling Text8 Transformer-XL + RMS dynamic eval + decay Bit per Character (BPC) 1.038 # 2
Language Modelling Text8 Transformer-XL + RMS dynamic eval + decay Number of params 277M # 1
Language Modelling WikiText-103 Transformer-XL with dynamic evaluation Validation perplexity 15.8 # 1
Language Modelling WikiText-103 Transformer-XL with dynamic evaluation Test perplexity 16.4 # 1
Language Modelling WikiText-103 Transformer-XL with dynamic evaluation Number of params 257M # 1