no code implementations • ICLR 2019 • Zihang Dai*, Zhilin Yang*, Yiming Yang, William W. Cohen, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov
Moreover, Transformer-XL is up to 1, 800+ times faster than vanilla Transformer during evaluation.
Language Modelling