Training Techniques | AdamW |
---|---|
Architecture | Dropout, Layer Normalization, Linear Layer, Residual Network, ResNet |
LR | 0.00004 |
SHOW MORE |
ViLBERT (short for Vision-and-Language BERT), is a model for learning task-agnostic joint representations of image content and natural language.
Explore live Visual Question Answering demo at AllenNLP.
from allennlp_models.pretrained import load_predictor
predictor = load_predictor("vqa-vilbert")
image_path = "https://storage.googleapis.com/allennlp-public-data/vqav2/baseball.jpg"
question = "What game are they playing?"
preds = predictor.predict(image_path, question)
best_prob, best_answer = max(zip(preds["probs"], preds["tokens"]), key=lambda x: x[0])
print(f"p({best_answer}) = {best_prob:.2%}")
# prints: p(baseball) = 100.00%
You can also get predictions using allennlp command line interface:
echo '{"question": "What game are they playing?", "image": "https://storage.googleapis.com/allennlp-public-data/vqav2/baseball.jpg"}' | \
allennlp predict https://storage.googleapis.com/allennlp-public-models/vilbert-vqa-pretrained.2021-02-11.tar.gz -
To evaluate the model on VQA dataset run:
allennlp evaluate https://storage.googleapis.com/allennlp-public-models/vilbert-vqa-pretrained.2021-02-11.tar.gz \
balanced_real_val
To train this model you can use allennlp
CLI tool and the configuration file vilbert_vqa_pretrained.jsonnet:
allennlp train vilbert_vqa_pretrained.jsonnet -s output_dir
See the AllenNLP Training and prediction guide for more details.
@inproceedings{Lu2019ViLBERTPT,
author = {Jiasen Lu and Dhruv Batra and D. Parikh and Stefan Lee},
booktitle = {NeurIPS},
title = {ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks},
year = {2019}
}