1152 papers with code • 12 benchmarks • 117 datasets
Language modeling is the task of predicting the next word or character in a document.
( Image credit: Exploring the Limits of Language Modeling )
Large transformer-based language models (LMs) trained on huge text corpora have shown unparalleled generation capabilities.
We also present a detailed empirical analysis of the key factors that are required to achieve these gains, including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource languages at scale.
As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging.
Ranked #9 on Semantic Textual Similarity on MRPC
In LXMERT, we build a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language encoder, and a cross-modality encoder.
Ranked #1 on Visual Question Answering on VizWiz 2018
With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling.
Ranked #1 on Sentiment Analysis on Yelp Binary classification
Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on taskspecific datasets.
Ranked #1 on Language Modelling on enwik8 (using extra training data)
On unsupervised machine translation, we obtain 34. 3 BLEU on WMT'16 German-English, improving the previous state of the art by more than 9 BLEU.