Last updated on Mar 15, 2021


Parameters 356 Million
File Size 1.26 GB
Training Data MultiNLI

Training Techniques AdamW
Architecture Dropout, Feedforward Network, Layer Normalization, Linear Layer, RoBERTa, Tanh
LR 0.0
Epochs 3
Dropout 0.1
Batch Size 16


This model implements a basic text classifier. The text is embedded into a text field using a RoBERTa-large model. The resulting sequence is pooled using a cls_pooler Seq2VecEncoder and then passed to a linear classification layer, which projects into the label space.

Explore live Textual Entailment demo at AllenNLP.

How do I load this model?

from allennlp_models.pretrained import load_predictor
predictor = load_predictor("pair-classification-roberta-mnli")

Getting predictions

premise = "A man in a black shirt overlooking bike maintenance."
hypothesis = "A man destroys a bike."
preds = predictor.predict(premise, hypothesis)
for label, prob in zip(labels, preds["label_probs"]):
    print(f"p({label}) = {prob:.2%}")
# prints:
# p(entailment) = 1.50%
# p(contradiction) = 81.88%
# p(neutral) = 16.62%

You can also get predictions using allennlp command line interface:

echo '{"premise": "A man in a black shirt overlooking bike maintenance.", "hypothesis": "A man destroys a bike."}' | \
    allennlp predict -

How do I evaluate this model?

To evaluate the model on Multi-genre Natural Language Inference (MultiNLI) dev set run:

allennlp evaluate \

How do I train this model?

To train this model you can use allennlp CLI tool and the configuration file mnli_roberta.jsonnet:

allennlp train mnli_roberta.jsonnet -s output_dir

See the AllenNLP Training and prediction guide for more details.


 author = {Y. Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and M. Lewis and Luke Zettlemoyer and Veselin Stoyanov},
 journal = {ArXiv},
 title = {RoBERTa: A Robustly Optimized BERT Pretraining Approach},
 volume = {abs/1907.11692},
 year = {2019}