Multiple Choice Question Answering (MCQA)

25 papers with code • 31 benchmarks • 7 datasets

A multiple-choice question (MCQ) is composed of two parts: a stem that identifies the question or problem, and a set of alternatives or possible answers that contain a key that is the best answer to the question, and a number of distractors that are plausible but incorrect answers to the question.

In a k-way MCQA task, a model is provided with a question q, a set of candidate options O = {O1, . . . , Ok}, and a supporting context for each option C = {C1, . . . , Ck}. The model needs to predict the correct answer option that is best supported by the given contexts.

Benchmarks

Add a Result

These leaderboards are used to track progress in Multiple Choice Question Answering (MCQA)

Dataset	Best Model	Compare
MMLU	Gemini Ultra ~1760B	See all
MedMCQA	Med-PaLM 2 (ER)	See all
BIG-bench (Hyperbaton)	Bloomberg GPT (few-shot, k=3)	See all
BIG-bench (Movie Recommendation)	PaLM 2 (few-shot, k=3, CoT)	See all
BIG-bench (Navigate)	PaLM 2 (few-shot, k=3, CoT)	See all
BIG-bench (Ruin Names)	PaLM 2 (few-shot, k=3, Direct)	See all
MMLU (Medical Genetics)	Med-PaLM 2 (ER)	See all
MMLU (College Biology)	Med-PaLM 2 (ER)	See all
MMLU (Professional medicine)	Med-PaLM 2 (5-shot)	See all
MMLU (Formal Logic)	Gopher (few-shot, k=5)	See all
MMLU (Abstract Algebra)	GAL 30B (zero-shot)	See all
MMLU (Econometrics)	Gopher (few-shot, k=5)	See all
MMLU (High School Computer Science)	GAL 120B (zero-shot)	See all
MMLU (College Mathematics)	GAL 120B (zero-shot)	See all
MMLU (Astronomy)	Chinchilla (few-shot, k=5)	See all
MMLU (Elementary Mathematics)	Chinchilla (few-shot, k=5)	See all
MMLU (High School Biology)	Chinchilla (few-shot, k=5)	See all
MMLU (College Chemistry)	Chinchilla (few-shot, k=5)	See all
MMLU (High School Mathematics)	GAL 120B (zero-shot)	See all
MMLU (Electrical Engineer)	GAL 120B (zero-shot)	See all
MMLU (College Physics)	Chinchilla (few-shot, k=5)	See all
MMLU (High School Statistics)	Chinchilla (few-shot, k=5)	See all
MMLU (Machine Learning)	Chinchilla (few-shot, k=5)	See all
MMLU (High School Chemistry)	Chinchilla (few-shot, k=5)	See all
MMLU (College Computer Science)	Chinchilla (few-shot, k=5)	See all
MMLU (High School Physics)	Chinchilla (few-shot, k=5)	See all
BIG-bench (Novel Concepts)	PaLM-540B (few-shot, k=5)	See all
IndicGLUE WSTP Pa	xlmindic-base-uniscript	See all
MMLU (Clinical Knowledge)	Med-PaLM 2 (ER)	See all
MMLU (Anatomy)	Med-PaLM 2 (ER)	See all
MMLU (College Medicine)	Med-PaLM (ER)	See all
FrenchMedMCQA	CamemBERT	See all

Show all 32 benchmarks

Collapse benchmarks

Datasets

Most implemented papers

Most implemented Social Latest No code

Llama 2: Open Foundation and Fine-Tuned Chat Models

facebookresearch/llama • • 18 Jul 2023

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.

Paper
Code

PaLM: Scaling Language Modeling with Pathways

lucidrains/CoCa-pytorch • • Google Research 2022

To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.

Paper
Code

From Recognition to Cognition: Visual Commonsense Reasoning

rowanz/r2c • • CVPR 2019

While this task is easy for humans, it is tremendously difficult for today's vision systems, requiring higher-order cognition and commonsense reasoning about the world.

Paper
Code

Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question Answering

nijianmo/arc-etrr-code • • NAACL 2019

In this paper we propose a retriever-reader model that learns to attend on essential terms during the question answering process.

Paper
Code

MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension

jind11/MMM-MCQA • • 1 Oct 2019

Machine Reading Comprehension (MRC) for question answering (QA), which aims to answer a question given the relevant context passages, is an important way to test the ability of intelligence systems to understand human language.

Paper
Code

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

allenai/dolma • NA 2021

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world.

Paper
Code

QuALITY: Question Answering with Long Input Texts, Yes!

nyu-mll/quality • • NAACL 2022

To enable building and testing models on long-document comprehension, we introduce QuALITY, a multiple-choice QA dataset with context passages in English that have an average length of about 5, 000 tokens, much longer than typical current models can process.

Paper
Code

Training Compute-Optimal Large Language Models

karpathy/llama2.c • • 29 Mar 2022

We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget.

Paper
Code

Variational Open-Domain Question Answering

VodLM/vod • • 23 Sep 2022

Retrieval-augmented models have proven to be effective in natural language processing tasks, yet there remains a lack of research on their optimization using variational inference.

Paper
Code

Counterfactual Variable Control for Robust and Interpretable Question Answering

PluviophileYU/CVC-QA • • 12 Oct 2020

We then conduct two novel CVC inference methods (on trained models) to capture the effect of comprehensive reasoning as the final prediction.

Paper
Code

Multiple Choice Question Answering (MCQA)

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result