RoBERTa SWAG

Model Name:*

Description with Markdown (optional):

# Summary

This is a multiple choice model patterned after the BERT architecture. It calculates a score for each sequence on top of the CLS token, and then chooses the alternative with the highest score.

## How do I load this model?

```python
from allennlp_models.pretrained import load_predictor
predictor = load_predictor("mc-roberta-swag")
```

### Getting predictions

```python
question = "To separate egg whites from the yolk using a water bottle, you should"
alternatives = [
    "Squeeze the water bottle and press it against the yolk. Release, which creates suction and lifts the yolk.",
    "Place the water bottle and press it against the yolk. Keep pushing, which creates suction and lifts the yolk."
]
preds = predictor.predict(question, alternatives)
print(alternatives[preds["best_alternative"]])
# prints: Place the water bottle and press it against the yolk. Keep pushing, which creates suction and lifts the yolk.
```

You can also get predictions using allennlp command line interface:

```shell
echo '{"prefix": "To separate egg whites from the yolk using a water bottle, you should",' \
    '"alternatives": [' \
    '"Squeeze the water bottle and press it against the yolk. Release, which creates suction and lifts the yolk.",' \
    '"Place the water bottle and press it against the yolk. Keep pushing, which creates suction and lifts the yolk."' \
    ']' | \
    allennlp predict https://storage.googleapis.com/allennlp-public-models/swag.2020-07-08.tar.gz -
```

## How do I train this model?

To train this model you can use `allennlp` CLI tool and the configuration file [swag.jsonnet](https://raw.githubusercontent.com/allenai/allennlp-models/v2.1.0/training_config/mc/swag.jsonnet):

```shell
allennlp train swag.jsonnet -s output_dir
```

See the [AllenNLP Training and prediction](https://guide.allennlp.org/training-and-prediction#2) guide for more details.

## Citation

```bibtex
@article{Liu2019RoBERTaAR,
 author = {Y. Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and M. Lewis and Luke Zettlemoyer and Veselin Stoyanov},
 journal = {ArXiv},
 title = {RoBERTa: A Robustly Optimized BERT Pretraining Approach},
 volume = {abs/1907.11692},
 year = {2019}
}
```

Paper:*

Code URL (optional):

LR	0.00001
Epochs	20

ROBERTA

Training Techniques	AdamW
Architecture	Dropout, Layer Normalization, Linear Layer, RoBERTa, Tanh
LR	0.00001
Epochs	20
SHOW MORE
SHOW LESS

allenai / allennlp

Summary

How do I load this model?

Getting predictions

How do I train this model?

Citation