RoBERTa is an extension of BERT with changes to the pretraining procedure. The modifications include:

  • training the model longer, with bigger batches, over more data
  • removing the next sentence prediction objective
  • training on longer sequences
  • dynamically changing the masking pattern applied to the training data. The authors also collect a large new dataset ($\text{CC-News}$) of comparable size to other privately used datasets, to better control for training set size effects
Source: RoBERTa: A Robustly Optimized BERT Pretraining Approach


Paper Code Results Date Stars


Task Papers Share
Language Modelling 74 9.26%
Sentiment Analysis 38 4.76%
Question Answering 33 4.13%
Text Classification 31 3.88%
Classification 25 3.13%
Natural Language Understanding 24 3.00%
Named Entity Recognition (NER) 20 2.50%
NER 16 2.00%
Test 16 2.00%


Component Type
Language Models