RoBERTa Common Sense QA

Last updated on Mar 15, 2021

RoBERTa Common Sense QA

Parameters 356 Million
File Size 1.22 GB
Training Data CommonsenseQA

Training Techniques AdamW
Architecture Dropout, Layer Normalization, Linear Layer, RoBERTa, Tanh
LR 1e-05
Epochs 20


This is a multiple choice model patterned after the BERT architecture. It calculates a score for each sequence on top of the CLS token, and then chooses the alternative with the highest score.

How do I load this model?

from allennlp_models.pretrained import load_predictor
predictor = load_predictor("mc-roberta-commonsenseqa")

Getting predictions

question = "If I am tilting a drink toward my face, what should I do before the liquid spills over?"
alternatives = ["open mouth", "eat first", "use glass"]
preds = predictor.predict(question, alternatives)
# prints: open mouth

You can also get predictions using allennlp command line interface:

echo '{"prefix": "If I am tilting a drink toward my face, what should I do before the liquid spills over?",' \
    '"alternatives": ["open mouth", "eat first", "use glass"]}' | \
    allennlp predict -

How do I train this model?

To train this model you can use allennlp CLI tool and the configuration file commonsenseqa.jsonnet:

allennlp train commonsenseqa.jsonnet -s output_dir

See the AllenNLP Training and prediction guide for more details.


 author = {Y. Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and M. Lewis and Luke Zettlemoyer and Veselin Stoyanov},
 journal = {ArXiv},
 title = {RoBERTa: A Robustly Optimized BERT Pretraining Approach},
 volume = {abs/1907.11692},
 year = {2019}