BERT-based Masked Language Model

Model Name:*

Description with Markdown (optional):

# Summary

The `MaskedLanguageModel` embeds some input tokens (including some which are masked), contextualizes them, then predicts targets for the masked tokens, computing a loss against known targets.

[Explore live Masked Language Modeling demo at AllenNLP](https://demo.allennlp.org/masked-lm).

## How do I load this model?

```python
from allennlp_models.pretrained import load_predictor
predictor = load_predictor("lm-masked-language-model")
```

### Getting predictions

```python
sentence = "I really like %s, especially %s."
preds = predictor.predict(sentence % ("[MASK]", "[MASK]"))

for pair in zip(*preds["words"]):
    print(sentence % pair)
# prints:
# I really like you, especially you.
# I really like him, especially now.
# I really like her, especially her.
# I really like them, especially him.
# I really like people, especially me.
```

You can also get predictions using allennlp command line interface:

```shell
echo '{"sentence": "I really like [MASK], especially [MASK]."}' | \
    allennlp predict https://storage.googleapis.com/allennlp-public-models/bert-masked-lm-2020-10-07.tar.gz -
```

## How do I train this model?

To train this model you can use `allennlp` CLI tool and the configuration file [bidirectional_language_model.jsonnet](https://raw.githubusercontent.com/allenai/allennlp-models/v2.1.0/training_config/lm/bidirectional_language_model.jsonnet):

```shell
allennlp train bidirectional_language_model.jsonnet -s output_dir
```

See the [AllenNLP Training and prediction](https://guide.allennlp.org/training-and-prediction#2) guide for more details.

## Citation

```bibtex
@inproceedings{Devlin2019BERTPO,
 author = {J. Devlin and Ming-Wei Chang and Kenton Lee and Kristina Toutanova},
 booktitle = {NAACL-HLT},
 title = {BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
 year = {2019}
}
```

Paper:*

Code URL (optional):

LR	0.01
Epochs	1

LAYER NORMALIZATION

Training Techniques	SGD
Architecture	BERT, Dropout, Layer Normalization, Linear Layer, Tanh
LR	0.01
Epochs	1
SHOW MORE
SHOW LESS

BERT-based Masked Language Model

allenai / allennlp

Summary

How do I load this model?

Getting predictions

How do I train this model?

Citation