TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Captioning	COCO Captions	Prompt Tuning	BLEU-4	41.81	# 10
Image Captioning	COCO Captions	Prompt Tuning	METEOR	31.51	# 6
Image Captioning	COCO Captions	Prompt Tuning	CIDER	141.4	# 15
Image Captioning	COCO Captions	Prompt Tuning	SPICE	24.42	# 13
Visual Entailment	SNLI-VE test	Prompt Tuning	Accuracy	90.12	# 2
Visual Entailment	SNLI-VE val	Prompt Tuning	Accuracy	90.04	# 2
Visual Question Answering (VQA)	VQA v2 test-std	Prompt Tuning	overall	78.53	# 11

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/prompt-tuning-for-generative-multimodal/visual-entailment-on-snli-ve-test)](https://paperswithcode.com/sota/visual-entailment-on-snli-ve-test?p=prompt-tuning-for-generative-multimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/prompt-tuning-for-generative-multimodal/visual-entailment-on-snli-ve-val)](https://paperswithcode.com/sota/visual-entailment-on-snli-ve-val?p=prompt-tuning-for-generative-multimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/prompt-tuning-for-generative-multimodal/image-captioning-on-coco-captions)](https://paperswithcode.com/sota/image-captioning-on-coco-captions?p=prompt-tuning-for-generative-multimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/prompt-tuning-for-generative-multimodal/visual-question-answering-on-vqa-v2-test-std)](https://paperswithcode.com/sota/visual-question-answering-on-vqa-v2-test-std?p=prompt-tuning-for-generative-multimodal)`

Prompt Tuning for Generative Multimodal Pretrained Models

4 Aug 2022 · Hao Yang, Junyang Lin, An Yang, Peng Wang, Chang Zhou, Hongxia Yang ·

Prompt tuning has become a new paradigm for model tuning and it has demonstrated success in natural language pretraining and even vision pretraining. In this work, we explore the transfer of prompt tuning to multimodal pretraining, with a focus on generative multimodal pretrained models, instead of contrastive ones. Specifically, we implement prompt tuning on the unified sequence-to-sequence pretrained model adaptive to both understanding and generation tasks. Experimental results demonstrate that the light-weight prompt tuning can achieve comparable performance with finetuning and surpass other light-weight tuning methods. Besides, in comparison with finetuned models, the prompt-tuned models demonstrate improved robustness against adversarial attacks. We further figure out that experimental factors, including the prompt length, prompt depth, and reparameteratization, have great impacts on the model performance, and thus we empirically provide a recommendation for the setups of prompt tuning. Despite the observed advantages, we still find some limitations in prompt tuning, and we correspondingly point out the directions for future studies. Codes are available at \url{https://github.com/OFA-Sys/OFA}

PDF Abstract

Code

Add Remove Mark official

ofa-sys/ofa official

2,320

Tasks

Add Remove

Image Captioning

Visual Entailment

Visual Question Answering (VQA)

Datasets

Visual Question Answering

Visual Question Answering v2.0

RefCOCO

COCO Captions SNLI-VE

Results from the Paper

Edit

Ranked #2 on Visual Entailment on SNLI-VE test

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Captioning	COCO Captions	Prompt Tuning	BLEU-4	41.81	# 10	Compare
			METEOR	31.51	# 6	Compare
			CIDER	141.4	# 15	Compare
			SPICE	24.42	# 13	Compare
Visual Entailment	SNLI-VE test	Prompt Tuning	Accuracy	90.12	# 2	Compare
Visual Entailment	SNLI-VE val	Prompt Tuning	Accuracy	90.04	# 2	Compare
Visual Question Answering (VQA)	VQA v2 test-std	Prompt Tuning	overall	78.53	# 11	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Prompt Tuning for Generative Multimodal Pretrained Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove