Transformers

RoBERTa is an extension of BERT with changes to the pretraining procedure. The modifications include:

  • training the model longer, with bigger batches, over more data
  • removing the next sentence prediction objective
  • training on longer sequences
  • dynamically changing the masking pattern applied to the training data. The authors also collect a large new dataset ($\text{CC-News}$) of comparable size to other privately used datasets, to better control for training set size effects
Source: RoBERTa: A Robustly Optimized BERT Pretraining Approach

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Language Modelling 76 8.96%
Sentence 56 6.60%
Sentiment Analysis 42 4.95%
Text Classification 33 3.89%
Question Answering 33 3.89%
Classification 24 2.83%
Named Entity Recognition (NER) 19 2.24%
NER 18 2.12%
Natural Language Understanding 16 1.89%

Components


Component Type
BERT
Language Models

Categories