Language modeling is the task of predicting the next word or character in a document.
( Image credit: Exploring the Limits of Language Modeling )
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
Then, instead of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that predicts whether each token in the corrupted input was replaced by a generator sample or not.
Ranked #6 on Question Answering on Quora Question Pairs
Our cell achieves a test set perplexity of 62. 4 on the Penn Treebank, which is 3. 6 perplexity better than the previous state-of-the-art model.
We propose a new benchmark corpus to be used for measuring progress in statistical language modeling.
Ranked #20 on Language Modelling on One Billion Word
We introduce "talking-heads attention" - a variation on multi-head attention which includes linearprojections across the attention-heads dimension, immediately before and after the softmax operation. While inserting only a small number of additional parameters and a moderate amount of additionalcomputation, talking-heads attention leads to better perplexities on masked language modeling tasks, aswell as better quality when transfer-learning to language comprehension and question answering tasks.
Traditional NLP has long held (supervised) syntactic parsing necessary for successful higher-level language understanding.
Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem.
To address this limitation, we introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer.
Ranked #4 on Language Modelling on Hutter Prize
Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences.
Ranked #2 on Open-Domain Question Answering on SearchQA