Multiple-choice

404 papers with code • 2 benchmarks • 10 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Multiple-choice models and implementations

Most implemented papers

VQA: Visual Question Answering

ramprs/grad-cam ICCV 2015

Given an image and a natural language question about the image, the task is to provide an accurate natural language answer.

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

salesforce/lavis Conference 2023

The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

PKU-YuanGroup/Video-LLaVA 16 Nov 2023

In this work, we unify visual representation into the language feature space to advance the foundational LLM towards a unified LVLM.

Flamingo: a Visual Language Model for Few-Shot Learning

mlfoundations/open_flamingo DeepMind 2022

Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research.

GPT Takes the Bar Exam

mjbommar/gpt-takes-the-bar-exam 29 Dec 2022

Nearly all jurisdictions in the United States require a professional license exam, commonly referred to as "the Bar Exam," as a precondition for law practice.

CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge

jonathanherzig/commonsenseqa NAACL 2019

To investigate question answering with prior knowledge, we present CommonsenseQA: a challenging new dataset for commonsense question answering.

From Recognition to Cognition: Visual Commonsense Reasoning

rowanz/r2c CVPR 2019

While this task is easy for humans, it is tremendously difficult for today's vision systems, requiring higher-order cognition and commonsense reasoning about the world.

Steering Llama 2 via Contrastive Activation Addition

nrimsky/caa 9 Dec 2023

We introduce Contrastive Activation Addition (CAA), an innovative method for steering language models by modifying their activations during forward passes.

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

damo-nlp-sg/videollama2 11 Jun 2024

In this paper, we present the VideoLLaMA 2, a set of Video Large Language Models (Video-LLMs) designed to enhance spatial-temporal modeling and audio understanding in video and audio-oriented tasks.

Revisiting Visual Question Answering Baselines

Cold-Winter/vqs 27 Jun 2016

Visual question answering (VQA) is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image understanding.