Training Techniques | SGD |
---|---|
Architecture | BERT, Dropout, Layer Normalization, Linear Layer, Tanh |
LR | 0.01 |
SHOW MORE |
The MaskedLanguageModel
embeds some input tokens (including some which are masked), contextualizes them, then predicts targets for the masked tokens, computing a loss against known targets.
Explore live Masked Language Modeling demo at AllenNLP.
from allennlp_models.pretrained import load_predictor
predictor = load_predictor("lm-masked-language-model")
sentence = "I really like %s, especially %s."
preds = predictor.predict(sentence % ("[MASK]", "[MASK]"))
for pair in zip(*preds["words"]):
print(sentence % pair)
# prints:
# I really like you, especially you.
# I really like him, especially now.
# I really like her, especially her.
# I really like them, especially him.
# I really like people, especially me.
You can also get predictions using allennlp command line interface:
echo '{"sentence": "I really like [MASK], especially [MASK]."}' | \
allennlp predict https://storage.googleapis.com/allennlp-public-models/bert-masked-lm-2020-10-07.tar.gz -
To train this model you can use allennlp
CLI tool and the configuration file bidirectional_language_model.jsonnet:
allennlp train bidirectional_language_model.jsonnet -s output_dir
See the AllenNLP Training and prediction guide for more details.
@inproceedings{Devlin2019BERTPO,
author = {J. Devlin and Ming-Wei Chang and Kenton Lee and Kristina Toutanova},
booktitle = {NAACL-HLT},
title = {BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
year = {2019}
}