TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Captioning	COCO Captions	Prismer	BLEU-4	40.4	# 18
Image Captioning	COCO Captions	Prismer	METEOR	31.4	# 7
Image Captioning	COCO Captions	Prismer	CIDER	136.5	# 20
Image Captioning	COCO Captions	Prismer	SPICE	24.4	# 14
Image Captioning	nocaps entire	Prismer	CIDEr	110.84	# 5
Image Captioning	nocaps entire	Prismer	B1	84.87	# 4
Image Captioning	nocaps entire	Prismer	B2	69.99	# 4
Image Captioning	nocaps entire	Prismer	B3	52.48	# 4
Image Captioning	nocaps entire	Prismer	B4	33.66	# 4
Image Captioning	nocaps entire	Prismer	ROUGE-L	60.55	# 4
Image Captioning	nocaps entire	Prismer	METEOR	31.13	# 4
Image Captioning	nocaps entire	Prismer	SPICE	14.91	# 3
Image Captioning	nocaps val	Prismer	CIDEr	107.9	# 1
Image Captioning	nocaps val	Prismer	SPICE	14.8	# 1
Visual Question Answering (VQA)	VQA v2 test-dev	Prismer	Accuracy	78.43	# 16
Visual Question Answering (VQA)	VQA v2 test-std	Prismer	overall	78.49	# 12
Visual Question Answering (VQA)	VQA v2 test-std	Prismer	yes/no	93.09	# 4
Visual Question Answering (VQA)	VQA v2 test-std	Prismer	number	61.39	# 6
Visual Question Answering (VQA)	VQA v2 test-std	Prismer	other	69.70	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/prismer-a-vision-language-model-with-an/image-captioning-on-nocaps-val)](https://paperswithcode.com/sota/image-captioning-on-nocaps-val?p=prismer-a-vision-language-model-with-an)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/prismer-a-vision-language-model-with-an/image-captioning-on-nocaps-entire)](https://paperswithcode.com/sota/image-captioning-on-nocaps-entire?p=prismer-a-vision-language-model-with-an)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/prismer-a-vision-language-model-with-an/visual-question-answering-on-vqa-v2-test-std)](https://paperswithcode.com/sota/visual-question-answering-on-vqa-v2-test-std?p=prismer-a-vision-language-model-with-an)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/prismer-a-vision-language-model-with-an/visual-question-answering-on-vqa-v2-test-dev)](https://paperswithcode.com/sota/visual-question-answering-on-vqa-v2-test-dev?p=prismer-a-vision-language-model-with-an)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/prismer-a-vision-language-model-with-an/image-captioning-on-coco-captions)](https://paperswithcode.com/sota/image-captioning-on-coco-captions?p=prismer-a-vision-language-model-with-an)`

Prismer: A Vision-Language Model with Multi-Task Experts

4 Mar 2023 · Shikun Liu, Linxi Fan, Edward Johns, Zhiding Yu, Chaowei Xiao, Anima Anandkumar ·

Recent vision-language models have shown impressive multi-modal generation capabilities. However, typically they require training huge models on massive datasets. As a more scalable alternative, we introduce Prismer, a data- and parameter-efficient vision-language model that leverages an ensemble of task-specific experts. Prismer only requires training of a small number of components, with the majority of network weights inherited from multiple readily-available, pre-trained experts, and kept frozen during training. By leveraging experts from a wide range of domains, we show Prismer can efficiently pool this expert knowledge and adapt it to various vision-language reasoning tasks. In our experiments, we show that Prismer achieves fine-tuned and few-shot learning performance which is competitive with current state-of-the-arts, whilst requiring up to two orders of magnitude less training data. Code is available at https://github.com/NVlabs/prismer.

PDF Abstract

Code

Add Remove Mark official

nvlabs/prismer official

↳ Quickstart in

Spaces

Replicate

1,287

KastanDay/video-pretrained-transfor…

Tasks

Add Remove

Few-Shot Learning

Image Captioning

Language Modelling

Visual Question Answering (VQA)

Datasets

MS COCO

Visual Question Answering

Visual Genome

Visual Question Answering v2.0

Conceptual Captions

TextVQA

COCO Captions

NoCaps

VSR

Results from the Paper

Edit

Ranked #1 on Image Captioning on nocaps val

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Captioning	COCO Captions	Prismer	BLEU-4	40.4	# 18	Compare
			METEOR	31.4	# 7	Compare
			CIDER	136.5	# 20	Compare
			SPICE	24.4	# 14	Compare
Image Captioning	nocaps entire	Prismer	CIDEr	110.84	# 5	Compare
			B1	84.87	# 4	Compare
			B2	69.99	# 4	Compare
			B3	52.48	# 4	Compare
			B4	33.66	# 4	Compare
			ROUGE-L	60.55	# 4	Compare
			METEOR	31.13	# 4	Compare
			SPICE	14.91	# 3	Compare
Image Captioning	nocaps val	Prismer	CIDEr	107.9	# 1	Compare
Image Captioning	nocaps val	Prismer	SPICE	14.8	# 1	Compare
Visual Question Answering (VQA)	VQA v2 test-dev	Prismer	Accuracy	78.43	# 16	Compare
Visual Question Answering (VQA)	VQA v2 test-std	Prismer	overall	78.49	# 12	Compare
			yes/no	93.09	# 4	Compare
			number	61.39	# 6	Compare
			other	69.70	# 4	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Prismer: A Vision-Language Model with Multi-Task Experts

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove