Generative Visual Question Answering
4 papers with code • 1 benchmarks • 1 datasets
Generating answers in free form to questions posed about images.
Most implemented papers
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.
Flamingo: a Visual Language Model for Few-Shot Learning
Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research.
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
In this paper, we focus on the problem of Medical Visual Question Answering (MedVQA), which is crucial in efficiently interpreting medical images with vital clinic-relevant information.
Multimodal Prompt Retrieval for Generative Visual Question Answering
Recent years have witnessed impressive results of pre-trained vision-language models on knowledge-intensive tasks such as visual question answering (VQA).