ViLBERT - Visual Question Answering

Last updated on Mar 15, 2021

ViLBERT - Visual Question Answering

Parameters 245 Million
File Size 863.88 MB

Training Techniques AdamW
Architecture Dropout, Layer Normalization, Linear Layer, Residual Network, ResNet
LR 0.00004
Epochs 40
SHOW MORE
SHOW LESS
README.md

Summary

ViLBERT (short for Vision-and-Language BERT), is a model for learning task-agnostic joint representations of image content and natural language.

Explore live Visual Question Answering demo at AllenNLP.

How do I load this model?

from allennlp_models.pretrained import load_predictor
predictor = load_predictor("vqa-vilbert")

Getting predictions

image_path = "https://storage.googleapis.com/allennlp-public-data/vqav2/baseball.jpg"
question = "What game are they playing?"
preds = predictor.predict(image_path, question)
best_prob, best_answer = max(zip(preds["probs"], preds["tokens"]), key=lambda x: x[0])
print(f"p({best_answer}) = {best_prob:.2%}")
# prints: p(baseball) = 100.00%

You can also get predictions using allennlp command line interface:

echo '{"question": "What game are they playing?", "image": "https://storage.googleapis.com/allennlp-public-data/vqav2/baseball.jpg"}' | \
    allennlp predict https://storage.googleapis.com/allennlp-public-models/vilbert-vqa-pretrained.2021-02-11.tar.gz -

How do I evaluate this model?

To evaluate the model on VQA dataset run:

allennlp evaluate https://storage.googleapis.com/allennlp-public-models/vilbert-vqa-pretrained.2021-02-11.tar.gz \
    balanced_real_val

How do I train this model?

To train this model you can use allennlp CLI tool and the configuration file vilbert_vqa_pretrained.jsonnet:

allennlp train vilbert_vqa_pretrained.jsonnet -s output_dir

See the AllenNLP Training and prediction guide for more details.

Citation

@inproceedings{Lu2019ViLBERTPT,
 author = {Jiasen Lu and Dhruv Batra and D. Parikh and Stefan Lee},
 booktitle = {NeurIPS},
 title = {ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks},
 year = {2019}
}