Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

ACL 2019 huggingface/pytorch-transformers

Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling.

LANGUAGE MODELLING

Language Models are Unsupervised Multitask Learners

Preprint 2019 huggingface/pytorch-transformers

Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on taskspecific datasets.

 SOTA for Language Modelling on Text8 (using extra training data)

COMMON SENSE REASONING DOCUMENT SUMMARIZATION LANGUAGE MODELLING MACHINE TRANSLATION QUESTION ANSWERING READING COMPREHENSION

Efficient softmax approximation for GPUs

ICML 2017 huggingface/pytorch-transformers

We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies.

Cannot find the paper you are looking for? You can Submit a new open access paper.