TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Medical Visual Question Answering	OVQA	CLIP-ViT w/ GPT2 (LoRA)	Free-form Accuracy	62.6	# 1
Medical Visual Question Answering	OVQA	CLIP-ViT w/ GPT2 (LoRA)	Yes/No Accuracy	84.7	# 2
Medical Visual Question Answering	OVQA	CLIP-ViT w/ GPT2 (LoRA)	Overall Accuracy	71	# 2
Medical Visual Question Answering	PathVQA	CLIP-ViT w/ GPT2 (LoRA)	Free-form Accuracy	40	# 1
Medical Visual Question Answering	PathVQA	CLIP-ViT w/ GPT2 (LoRA)	Yes/No Accuracy	87	# 4
Medical Visual Question Answering	PathVQA	CLIP-ViT w/ GPT2 (LoRA)	Overall Accuracy	63.6	# 2
Medical Visual Question Answering	SLAKE-English	CLIP-ViT w/ GPT2 (LoRA)	Overall Accuracy	83.3	# 4
Medical Visual Question Answering	SLAKE-English	CLIP-ViT w/ GPT2 (LoRA)	Close-ended Accuracy	82.1	# 8
Medical Visual Question Answering	SLAKE-English	CLIP-ViT w/ GPT2 (LoRA)	Open-ended Accuracy	84.3	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/open-ended-medical-visual-question-answering/medical-visual-question-answering-on-ovqa)](https://paperswithcode.com/sota/medical-visual-question-answering-on-ovqa?p=open-ended-medical-visual-question-answering)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/open-ended-medical-visual-question-answering/medical-visual-question-answering-on-pathvqa)](https://paperswithcode.com/sota/medical-visual-question-answering-on-pathvqa?p=open-ended-medical-visual-question-answering)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/open-ended-medical-visual-question-answering/medical-visual-question-answering-on-vqa)](https://paperswithcode.com/sota/medical-visual-question-answering-on-vqa?p=open-ended-medical-visual-question-answering)`

Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models

10 Mar 2023 · Tom van Sonsbeek, Mohammad Mahdi Derakhshani, Ivona Najdenkoska, Cees G. M. Snoek, Marcel Worring ·

Medical Visual Question Answering (VQA) is an important challenge, as it would lead to faster and more accurate diagnoses and treatment decisions. Most existing methods approach it as a multi-class classification problem, which restricts the outcome to a predefined closed-set of curated answers. We focus on open-ended VQA and motivated by the recent advances in language models consider it as a generative task. Leveraging pre-trained language models, we introduce a novel method particularly suited for small, domain-specific, medical datasets. To properly communicate the medical images to the language model, we develop a network that maps the extracted visual features to a set of learnable tokens. Then, alongside the question, these learnable tokens directly prompt the language model. We explore recent parameter-efficient fine-tuning strategies for language models, which allow for resource- and data-efficient fine-tuning. We evaluate our approach on the prime medical VQA benchmarks, namely, Slake, OVQA and PathVQA. The results demonstrate that our approach outperforms existing methods across various training settings while also being computationally efficient.

PDF Abstract

Code

Add Remove Mark official

tjvsonsbeek/open-ended-medical-vqa official

Tasks

Add Remove

Language Modelling

Medical Visual Question Answering

Multi-class Classification

Question Answering

Visual Question Answering

Visual Question Answering (VQA)

Datasets

SLAKE

PathVQA

SLAKE-English

OVQA

Results from the Paper

Edit

Ranked #1 on Medical Visual Question Answering on OVQA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Medical Visual Question Answering	OVQA	CLIP-ViT w/ GPT2 (LoRA)	Free-form Accuracy	62.6	# 1	Compare
			Yes/No Accuracy	84.7	# 2	Compare
			Overall Accuracy	71	# 2	Compare
Medical Visual Question Answering	PathVQA	CLIP-ViT w/ GPT2 (LoRA)	Free-form Accuracy	40	# 1	Compare
			Yes/No Accuracy	87	# 4	Compare
			Overall Accuracy	63.6	# 2	Compare
Medical Visual Question Answering	SLAKE-English	CLIP-ViT w/ GPT2 (LoRA)	Overall Accuracy	83.3	# 4	Compare
			Close-ended Accuracy	82.1	# 8	Compare
			Open-ended Accuracy	84.3	# 1	Compare

Methods

Add Remove

CLIP • GPT-2

Edit Social Preview

Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove