Transformer-XL: Language Modeling with Longer-Term Dependency

ICLR 2019 Zihang Dai*Zhilin Yang*Yiming YangWilliam W. CohenJaime CarbonellQuoc V. LeRuslan Salakhutdinov

We propose a novel neural architecture, Transformer-XL, for modeling longer-term dependency. To address the limitation of fixed-length contexts, we introduce a notion of recurrence by reusing the representations from the history... (read more)

PDF Abstract


No code implementations yet. Submit your code now

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper