Training Techniques | AdamW |
---|---|
Architecture | Dropout, Layer Normalization, Linear Layer, RoBERTa, Tanh |
LR | 0.00001 |
SHOW MORE |
This is a multiple choice model patterned after the BERT architecture. It calculates a score for each sequence on top of the CLS token, and then chooses the alternative with the highest score.
from allennlp_models.pretrained import load_predictor
predictor = load_predictor("mc-roberta-commonsenseqa")
question = "If I am tilting a drink toward my face, what should I do before the liquid spills over?"
alternatives = ["open mouth", "eat first", "use glass"]
preds = predictor.predict(question, alternatives)
print(alternatives[preds["best_alternative"]])
# prints: open mouth
You can also get predictions using allennlp command line interface:
echo '{"prefix": "If I am tilting a drink toward my face, what should I do before the liquid spills over?",' \
'"alternatives": ["open mouth", "eat first", "use glass"]}' | \
allennlp predict https://storage.googleapis.com/allennlp-public-models/commonsenseqa.2020-07-08.tar.gz -
To train this model you can use allennlp
CLI tool and the configuration file commonsenseqa.jsonnet:
allennlp train commonsenseqa.jsonnet -s output_dir
See the AllenNLP Training and prediction guide for more details.
@article{Liu2019RoBERTaAR,
author = {Y. Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and M. Lewis and Luke Zettlemoyer and Veselin Stoyanov},
journal = {ArXiv},
title = {RoBERTa: A Robustly Optimized BERT Pretraining Approach},
volume = {abs/1907.11692},
year = {2019}
}