Multiple-choice
404 papers with code • 2 benchmarks • 10 datasets
Libraries
Use these libraries to find Multiple-choice models and implementationsDatasets
Most implemented papers
VQA: Visual Question Answering
Given an image and a natural language question about the image, the task is to provide an accurate natural language answer.
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
In this work, we unify visual representation into the language feature space to advance the foundational LLM towards a unified LVLM.
Flamingo: a Visual Language Model for Few-Shot Learning
Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research.
GPT Takes the Bar Exam
Nearly all jurisdictions in the United States require a professional license exam, commonly referred to as "the Bar Exam," as a precondition for law practice.
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
To investigate question answering with prior knowledge, we present CommonsenseQA: a challenging new dataset for commonsense question answering.
From Recognition to Cognition: Visual Commonsense Reasoning
While this task is easy for humans, it is tremendously difficult for today's vision systems, requiring higher-order cognition and commonsense reasoning about the world.
Steering Llama 2 via Contrastive Activation Addition
We introduce Contrastive Activation Addition (CAA), an innovative method for steering language models by modifying their activations during forward passes.
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
In this paper, we present the VideoLLaMA 2, a set of Video Large Language Models (Video-LLMs) designed to enhance spatial-temporal modeling and audio understanding in video and audio-oriented tasks.
Revisiting Visual Question Answering Baselines
Visual question answering (VQA) is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image understanding.