Generative Visual Question Answering

4 papers with code • 1 benchmarks • 1 datasets

Generating answers in free form to questions posed about images.

Datasets


Most implemented papers

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

salesforce/lavis 30 Jan 2023

The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.

Flamingo: a Visual Language Model for Few-Shot Learning

mlfoundations/open_flamingo DeepMind 2022

Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research.

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

xiaoman-zhang/PMC-VQA 17 May 2023

In this paper, we focus on the problem of Medical Visual Question Answering (MedVQA), which is crucial in efficiently interpreting medical images with vital clinic-relevant information.

Multimodal Prompt Retrieval for Generative Visual Question Answering

tossowski/multimodalpromptretrieval 30 Jun 2023

Recent years have witnessed impressive results of pre-trained vision-language models on knowledge-intensive tasks such as visual question answering (VQA).