OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

We introduce OpenFlamingo, a family of autoregressive vision-language models ranging from 3B to 9B parameters. OpenFlamingo is an ongoing effort to produce an open-source replication of DeepMind's Flamingo models. On seven vision-language datasets, OpenFlamingo models average between 80 - 89% of corresponding Flamingo performance. This technical report describes our models, training data, hyperparameters, and evaluation suite. We share our models and code at https://github.com/mlfoundations/open_flamingo.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Visual Question Answering (VQA) InfiMM-Eval OpenFlamingo-v2 Overall score 6.82 # 14
Deductive 8.88 # 13
Abductive 5.3 # 14
Analogical 1.11 # 14
Params 9B # 1
Visual Question Answering MM-Vet OpenFlamingo-9B (MPT-7B) GPT-4 score 24.8±0.2 # 178
Params 9B # 1
Visual Question Answering MM-Vet OpenFlamingo-9B (LLaMA-7B) GPT-4 score 21.8±0.1 # 184
Params 9B # 1
Visual Question Answering MM-Vet v2 OpenFlamingo-9B GPT-4 score 17.6±0.2 # 21
Params 9B # 1

Methods


No methods listed for this paper. Add relevant methods here