RoBERTa

Introduced by Liu et al. in RoBERTa: A Robustly Optimized BERT Pretraining Approach

RoBERTa is an extension of BERT with changes to the pretraining procedure. The modifications include:

training the model longer, with bigger batches, over more data
removing the next sentence prediction objective
training on longer sequences
dynamically changing the masking pattern applied to the training data. The authors also collect a large new dataset ($\text{CC-News}$) of comparable size to other privately used datasets, to better control for training set size effects

Source: RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper	Code	Results	Date	Stars

Task	Papers	Share
Language Modelling	76	9.26%
Sentence	58	7.06%
Sentiment Analysis	41	4.99%
Question Answering	32	3.90%
Text Classification	32	3.90%
Classification	24	2.92%
Natural Language Understanding	17	2.07%
Named Entity Recognition (NER)	16	1.95%
NER	15	1.83%

This feature is experimental; we are continuously improving our matching algorithm.

Component	Type	Add Remove
BERT	Language Models