Medical Visual Question Answering
32 papers with code • 5 benchmarks • 7 datasets
Libraries
Use these libraries to find Medical Visual Question Answering models and implementationsMost implemented papers
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.
PathVQA: 30000+ Questions for Medical Visual Question Answering
To achieve this goal, the first step is to create a visual question answering (VQA) dataset where the AI agent is presented with a pathology image together with a question and is asked to give the correct answer.
Flamingo: a Visual Language Model for Few-Shot Learning
Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research.
BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs
Therefore, training an effective generalist biomedical model requires high-quality multimodal data, such as parallel image-text pairs.
Overcoming Data Limitation in Medical Visual Question Answering
Traditional approaches for Visual Question Answering (VQA) require large amount of labeled data for training.
SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering
We show that SLAKE can be used to facilitate the development and evaluation of Med-VQA systems.
Multiple Meta-model Quantifying for Medical Visual Question Answering
However, most of the existing medical VQA methods rely on external data for transfer learning, while the meta-data within the dataset is not fully utilized.
Self-supervised vision-language pretraining for Medical visual question answering
Medical image visual question answering (VQA) is a task to answer clinical questions, given a radiographic image, which is a challenging problem that requires a model to integrate both vision and language information.
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
In this paper, we focus on the problem of Medical Visual Question Answering (MedVQA), which is crucial in efficiently interpreting medical images with vital clinic-relevant information.
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images
To develop our dataset, we first construct two uni-modal resources: 1) The MIMIC-CXR-VQA dataset, our newly created medical visual question answering (VQA) benchmark, specifically designed to augment the imaging modality in EHR QA, and 2) EHRSQL (MIMIC-IV), a refashioned version of a previously established table-based EHR QA dataset.