Medical Visual Question Answering
26 papers with code • 5 benchmarks • 6 datasets
Most implemented papers
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.
PathVQA: 30000+ Questions for Medical Visual Question Answering
To achieve this goal, the first step is to create a visual question answering (VQA) dataset where the AI agent is presented with a pathology image together with a question and is asked to give the correct answer.
Flamingo: a Visual Language Model for Few-Shot Learning
Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research.
Overcoming Data Limitation in Medical Visual Question Answering
Traditional approaches for Visual Question Answering (VQA) require large amount of labeled data for training.
SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering
We show that SLAKE can be used to facilitate the development and evaluation of Med-VQA systems.
Multiple Meta-model Quantifying for Medical Visual Question Answering
However, most of the existing medical VQA methods rely on external data for transfer learning, while the meta-data within the dataset is not fully utilized.
Self-supervised vision-language pretraining for Medical visual question answering
Medical image visual question answering (VQA) is a task to answer clinical questions, given a radiographic image, which is a challenging problem that requires a model to integrate both vision and language information.
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
In this paper, we focus on the problem of Medical Visual Question Answering (MedVQA), which is crucial in efficiently interpreting medical images with vital clinic-relevant information.
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
In this paper, we propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images.
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports
Joint image-text embedding extracted from medical images and associated contextual reports is the bedrock for most biomedical vision-and-language (V+L) tasks, including medical visual question answering, clinical image-text retrieval, clinical report auto-generation.