RoBERTa SWAG

Last updated on Mar 15, 2021

RoBERTa SWAG

Parameters 356 Million
File Size 1.23 GB
Training Data SWAG

Training Techniques AdamW
Architecture Dropout, Layer Normalization, Linear Layer, RoBERTa, Tanh
LR 1e-05
Epochs 20
SHOW MORE
SHOW LESS
README.md

Summary

This is a multiple choice model patterned after the BERT architecture. It calculates a score for each sequence on top of the CLS token, and then chooses the alternative with the highest score.

How do I load this model?

from allennlp_models.pretrained import load_predictor
predictor = load_predictor("mc-roberta-swag")

Getting predictions

question = "To separate egg whites from the yolk using a water bottle, you should"
alternatives = [
    "Squeeze the water bottle and press it against the yolk. Release, which creates suction and lifts the yolk.",
    "Place the water bottle and press it against the yolk. Keep pushing, which creates suction and lifts the yolk."
]
preds = predictor.predict(question, alternatives)
print(alternatives[preds["best_alternative"]])
# prints: Place the water bottle and press it against the yolk. Keep pushing, which creates suction and lifts the yolk.

You can also get predictions using allennlp command line interface:

echo '{"prefix": "To separate egg whites from the yolk using a water bottle, you should",' \
    '"alternatives": [' \
    '"Squeeze the water bottle and press it against the yolk. Release, which creates suction and lifts the yolk.",' \
    '"Place the water bottle and press it against the yolk. Keep pushing, which creates suction and lifts the yolk."' \
    ']' | \
    allennlp predict https://storage.googleapis.com/allennlp-public-models/swag.2020-07-08.tar.gz -

How do I train this model?

To train this model you can use allennlp CLI tool and the configuration file swag.jsonnet:

allennlp train swag.jsonnet -s output_dir

See the AllenNLP Training and prediction guide for more details.

Citation

@article{Liu2019RoBERTaAR,
 author = {Y. Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and M. Lewis and Luke Zettlemoyer and Veselin Stoyanov},
 journal = {ArXiv},
 title = {RoBERTa: A Robustly Optimized BERT Pretraining Approach},
 volume = {abs/1907.11692},
 year = {2019}
}