Medical Visual Question Answering
32 papers with code • 5 benchmarks • 7 datasets
Libraries
Use these libraries to find Medical Visual Question Answering models and implementationsLatest papers
LaPA: Latent Prompt Assist Model For Medical Visual Question Answering
In this paper, we propose the Latent Prompt Assist model (LaPA) for medical visual question answering.
MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis
Chest X-ray images are commonly used for predicting acute and chronic cardiopulmonary conditions, but efforts to integrate them with structured clinical data face challenges due to incomplete electronic health records (EHR).
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
Importantly, all images in this benchmark are sourced from authentic medical scenarios, ensuring alignment with the requirements of the medical field and suitability for evaluating LVLMs.
Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations
Additionally, we facilitated future research and development by releasing a Python module for medical LLM evaluation and establishing a dedicated leaderboard on Hugging Face for medical domain LLMs.
Hallucination Benchmark in Medical Visual Question Answering
The recent success of large language and vision models (LLVMs) on vision question answering (VQA), particularly their applications in medicine (Med-VQA), has shown a great potential of realizing effective visual assistants for healthcare.
PeFoMed: Parameter Efficient Fine-tuning of Multimodal Large Language Models for Medical Imaging
In this paper, we propose a parameter efficient framework for fine-tuning MLLMs, specifically validated on medical visual question answering (Med-VQA) and medical report generation (MRG) tasks, using public benchmark datasets.
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images
To develop our dataset, we first construct two uni-modal resources: 1) The MIMIC-CXR-VQA dataset, our newly created medical visual question answering (VQA) benchmark, specifically designed to augment the imaging modality in EHR QA, and 2) EHRSQL (MIMIC-IV), a refashioned version of a previously established table-based EHR QA dataset.
Med-Flamingo: a Multimodal Medical Few-shot Learner
However, existing models typically have to be fine-tuned on sizeable down-stream datasets, which poses a significant limitation as in many medical applications data is scarce, necessitating models that are capable of learning from few examples in real-time.
Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering
Given a pair of main and reference images, this task attempts to answer several questions on both diseases and, more importantly, the differences between them.
Masked Vision and Language Pre-training with Unimodal and Multimodal Contrastive Losses for Medical Visual Question Answering
Medical visual question answering (VQA) is a challenging task that requires answering clinical questions of a given medical image, by taking consider of both visual and language information.