TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Medical Visual Question Answering	SLAKE-English	PubMedCLIP	Overall Accuracy	80.1	# 6
Medical Visual Question Answering	SLAKE-English	PubMedCLIP	Close-ended Accuracy	82.5	# 7
Medical Visual Question Answering	SLAKE-English	PubMedCLIP	Open-ended Accuracy	78.4	# 4
Medical Visual Question Answering	VQA-RAD	PubMedCLIP	Close-ended Accuracy	80	# 10
Medical Visual Question Answering	VQA-RAD	PubMedCLIP	Open-ended Accuracy	60.1	# 8
Medical Visual Question Answering	VQA-RAD	PubMedCLIP	Overall Accuracy	72.1	# 9

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/2112-13906/medical-visual-question-answering-on-vqa)](https://paperswithcode.com/sota/medical-visual-question-answering-on-vqa?p=2112-13906)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/2112-13906/medical-visual-question-answering-on-vqa-rad)](https://paperswithcode.com/sota/medical-visual-question-answering-on-vqa-rad?p=2112-13906)`

Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain?

27 Dec 2021 · Sedigheh Eslami, Gerard de Melo, Christoph Meinel ·

Contrastive Language--Image Pre-training (CLIP) has shown remarkable success in learning with cross-modal supervision from extensive amounts of image--text pairs collected online. Thus far, the effectiveness of CLIP has been investigated primarily in general-domain multimodal problems. This work evaluates the effectiveness of CLIP for the task of Medical Visual Question Answering (MedVQA). To this end, we present PubMedCLIP, a fine-tuned version of CLIP for the medical domain based on PubMed articles. Our experiments are conducted on two MedVQA benchmark datasets and investigate two MedVQA methods, MEVF (Mixture of Enhanced Visual Features) and QCR (Question answering via Conditional Reasoning). For each of these, we assess the merits of visual representation learning using PubMedCLIP, the original CLIP, and state-of-the-art MAML (Model-Agnostic Meta-Learning) networks pre-trained only on visual data. We open source the code for our MedVQA pipeline and pre-training PubMedCLIP. CLIP and PubMedCLIP achieve improvements in comparison to MAML's visual encoder. PubMedCLIP achieves the best results with gains in the overall accuracy of up to 3%. Individual examples illustrate the strengths of PubMedCLIP in comparison to the previously widely used MAML networks. Visual representation learning with language supervision in PubMedCLIP leads to noticeable improvements for MedVQA. Our experiments reveal distributional differences in the two MedVQA benchmark datasets that have not been imparted in previous work and cause different back-end visual encoders in PubMedCLIP to exhibit different behavior on these datasets. Moreover, we witness fundamental performance differences of VQA in general versus medical domains.

PDF Abstract

Code

Add Remove Mark official

sarahesl/pubmedclip official

110

Tasks

Add Remove

Medical Visual Question Answering

Meta-Learning

Question Answering

Representation Learning

Visual Question Answering

Visual Question Answering (VQA)

Datasets

Visual Question Answering

VQA-RAD

SLAKE

SLAKE-English

Results from the Paper

Add Remove

Ranked #7 on Medical Visual Question Answering on SLAKE-English

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Medical Visual Question Answering	SLAKE-English	PubMedCLIP	Overall Accuracy	80.1	# 6	Compare
			Close-ended Accuracy	82.5	# 7	Compare
			Open-ended Accuracy	78.4	# 4	Compare
Medical Visual Question Answering	VQA-RAD	PubMedCLIP	Close-ended Accuracy	80	# 10	Compare
			Open-ended Accuracy	60.1	# 8	Compare
			Overall Accuracy	72.1	# 9	Compare

Methods

Add Remove

CLIP • MAML

Edit Social Preview

Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain?

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove