MMLU (Massive Multitask Language Understanding)

Introduced by Hendrycks et al. in Measuring Massive Multitask Language Understanding

MMLU (Massive Multitask Language Understanding) is a new benchmark designed to measure knowledge acquired during pretraining by evaluating models exclusively in zero-shot and few-shot settings. This makes the benchmark more challenging and more similar to how we evaluate humans. The benchmark covers 57 subjects across STEM, the humanities, the social sciences, and more. It ranges in difficulty from an elementary level to an advanced professional level, and it tests both world knowledge and problem solving ability. Subjects range from traditional areas, such as mathematics and history, to more specialized areas like law and ethics. The granularity and breadth of the subjects makes the benchmark ideal for identifying a model’s blind spots.

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Multi-task Language Understanding	MMLU	Gemini Ultra ~1760B
Mathematical Reasoning	MMLU (Mathematics)	GAL 120B <work>
Multiple Choice Question Answering (MCQA)	MMLU (College Biology)	Med-PaLM 2
Multiple Choice Question Answering (MCQA)	MMLU (Medical Genetics)	Med-PaLM 2
Multiple Choice Question Answering (MCQA)	MMLU (Professional medicine)	Med-PaLM 2
Multiple Choice Question Answering (MCQA)	MMLU (College Mathematics)	GAL 120B
Multiple Choice Question Answering (MCQA)	MMLU (Astronomy)	Chinchilla
Multiple Choice Question Answering (MCQA)	MMLU (Elementary Mathematics)	Chinchilla
Multiple Choice Question Answering (MCQA)	MMLU (College Chemistry)	Chinchilla
Multiple Choice Question Answering (MCQA)	MMLU (High School Biology)	Chinchilla
Multiple Choice Question Answering (MCQA)	MMLU (Formal Logic)	Gopher
Multiple Choice Question Answering (MCQA)	MMLU (Abstract Algebra)	GAL 30B
Multiple Choice Question Answering (MCQA)	MMLU (Econometrics)	Gopher
Multiple Choice Question Answering (MCQA)	MMLU (High School Computer Science)	GAL 120B
Multiple Choice Question Answering (MCQA)	MMLU (High School Mathematics)	GAL 120B
Multiple Choice Question Answering (MCQA)	MMLU (Electrical Engineer)	GAL 120B
Multiple Choice Question Answering (MCQA)	MMLU (College Physics)	Chinchilla
Multiple Choice Question Answering (MCQA)	MMLU (High School Statistics)	Chinchilla
Multiple Choice Question Answering (MCQA)	MMLU (High School Chemistry)	Chinchilla
Multiple Choice Question Answering (MCQA)	MMLU (High School Physics)	Chinchilla
Multiple Choice Question Answering (MCQA)	MMLU (Machine Learning)	Chinchilla
Multiple Choice Question Answering (MCQA)	MMLU (College Computer Science)	Chinchilla
Multiple Choice Question Answering (MCQA)	MMLU (Anatomy)	Med-PaLM 2
Multiple Choice Question Answering (MCQA)	MMLU (College Medicine)	Med-PaLM
Multiple Choice Question Answering (MCQA)	MMLU (Clinical Knowledge)	Med-PaLM 2