TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Common Sense Reasoning	ARC (Challenge)	LLaMA 65B (zero-shot)	Accuracy	56.0	# 23
Common Sense Reasoning	ARC (Challenge)	LLaMA 13B (zero-shot)	Accuracy	52.7	# 26
Common Sense Reasoning	ARC (Challenge)	LLaMA 33B (zero-shot)	Accuracy	57.8	# 21
Common Sense Reasoning	ARC (Challenge)	LLaMA 7B (zero-shot)	Accuracy	47.6	# 34
Common Sense Reasoning	ARC (Easy)	LLaMA 33B (0-shot)	Accuracy	80.0	# 12
Common Sense Reasoning	ARC (Easy)	LLaMA 65B (0-shot)	Accuracy	78.9	# 16
Common Sense Reasoning	ARC (Easy)	LLaMA 7B (0-shot)	Accuracy	72.8	# 23
Common Sense Reasoning	ARC (Easy)	LLaMA 13B (0-shot)	Accuracy	74.8	# 20
Question Answering	BoolQ	LLaMA 13B (zero-shot)	Accuracy	78.1	# 28
Question Answering	BoolQ	LLaMA 7B (zero-shot)	Accuracy	76.5	# 30
Question Answering	BoolQ	LLaMA 65B (0-shot)	Accuracy	85.3	# 16
Question Answering	BoolQ	LLaMA 33B (0-shot)	Accuracy	83.1	# 22
Stereotypical Bias Analysis	CrowS-Pairs	LLaMA 65B	Gender	70.6	# 4
Stereotypical Bias Analysis	CrowS-Pairs	LLaMA 65B	Religion	70.6	# 4
Stereotypical Bias Analysis	CrowS-Pairs	LLaMA 65B	Race/Color	57.0	# 1
Stereotypical Bias Analysis	CrowS-Pairs	LLaMA 65B	Sexual Orientation	81.0	# 4
Stereotypical Bias Analysis	CrowS-Pairs	LLaMA 65B	Age	70.1	# 4
Stereotypical Bias Analysis	CrowS-Pairs	LLaMA 65B	Nationality	64.2	# 4
Stereotypical Bias Analysis	CrowS-Pairs	LLaMA 65B	Disability	66.7	# 1
Stereotypical Bias Analysis	CrowS-Pairs	LLaMA 65B	Physical Appearance	77.8	# 4
Stereotypical Bias Analysis	CrowS-Pairs	LLaMA 65B	Socioeconomic status	71.5	# 2
Stereotypical Bias Analysis	CrowS-Pairs	LLaMA 65B	Overall	66.6	# 3
Arithmetic Reasoning	GSM8K	LLaMA 33B	Accuracy	35.6	# 131
Arithmetic Reasoning	GSM8K	LLaMA 33B	Parameters (Billion)	33	# 70
Arithmetic Reasoning	GSM8K	LLaMA 7B (maj1@k)	Accuracy	18.1	# 140
Arithmetic Reasoning	GSM8K	LLaMA 7B (maj1@k)	Parameters (Billion)	7	# 10
Arithmetic Reasoning	GSM8K	LLaMA 33B-maj1@k	Accuracy	53.1	# 117
Arithmetic Reasoning	GSM8K	LLaMA 33B-maj1@k	Parameters (Billion)	33	# 70
Arithmetic Reasoning	GSM8K	LLaMA 65B	Accuracy	50.9	# 123
Arithmetic Reasoning	GSM8K	LLaMA 65B	Parameters (Billion)	65	# 83
Arithmetic Reasoning	GSM8K	LLaMA 13B-maj1@k	Accuracy	29.3	# 135
Arithmetic Reasoning	GSM8K	LLaMA 13B-maj1@k	Parameters (Billion)	13	# 53
Arithmetic Reasoning	GSM8K	LLaMA 13B	Accuracy	17.8	# 143
Arithmetic Reasoning	GSM8K	LLaMA 13B	Parameters (Billion)	13	# 53
Arithmetic Reasoning	GSM8K	LLaMA 7B	Accuracy	11.0	# 147
Arithmetic Reasoning	GSM8K	LLaMA 7B	Parameters (Billion)	7	# 10
Arithmetic Reasoning	GSM8K	LLaMA 65B-maj1@k	Accuracy	69.7	# 94
Arithmetic Reasoning	GSM8K	LLaMA 65B-maj1@k	Parameters (Billion)	65	# 83
Sentence Completion	HellaSwag	LLaMA 65B (0-shot)	Accuracy	84.2	# 24
Sentence Completion	HellaSwag	LLaMA 33B (0-shot)	Accuracy	82.8	# 31
Sentence Completion	HellaSwag	LLaMA 13B (0-shot)	Accuracy	79.2	# 42
Sentence Completion	HellaSwag	LLaMA 7B (0-shot)	Accuracy	76.1	# 47
Code Generation	HumanEval	LLaMA 13B (zero-shot)	Pass@1	15.8	# 109
Code Generation	HumanEval	LLaMA 7B (zero-shot)	Pass@1	10.5	# 120
Code Generation	HumanEval	LLaMA 33B (zero-shot)	Pass@1	21.7	# 97
Code Generation	HumanEval	LLaMA 65B (zero-shot)	Pass@1	23.7	# 91
Math Word Problem Solving	MATH	LLaMA 65B (maj1@k)	Accuracy	20.5	# 76
Math Word Problem Solving	MATH	LLaMA 65B (maj1@k)	Parameters (Billions)	65	# 20
Math Word Problem Solving	MATH	LLaMA 7B	Accuracy	2.9	# 107
Math Word Problem Solving	MATH	LLaMA 7B	Parameters (Billions)	7	# 58
Math Word Problem Solving	MATH	LLaMA 65B	Accuracy	10.6	# 90
Math Word Problem Solving	MATH	LLaMA 65B	Parameters (Billions)	65	# 20
Math Word Problem Solving	MATH	LLaMA 33B-maj1@k	Accuracy	15.2	# 82
Math Word Problem Solving	MATH	LLaMA 33B-maj1@k	Parameters (Billions)	33	# 34
Math Word Problem Solving	MATH	LLaMA 33B	Accuracy	7.1	# 94
Math Word Problem Solving	MATH	LLaMA 33B	Parameters (Billions)	33	# 34
Math Word Problem Solving	MATH	LLaMA 13B-maj1@k	Accuracy	8.8	# 91
Math Word Problem Solving	MATH	LLaMA 13B-maj1@k	Parameters (Billions)	13	# 38
Math Word Problem Solving	MATH	LLaMA 13B	Accuracy	3.9	# 105
Math Word Problem Solving	MATH	LLaMA 13B	Parameters (Billions)	13	# 38
Math Word Problem Solving	MATH	LLaMA 7B-maj1@k	Accuracy	6.9	# 95
Math Word Problem Solving	MATH	LLaMA 7B-maj1@k	Parameters (Billions)	7	# 58
Code Generation	MBPP	LLaMA 13B (0-shot)	Accuracy	22	# 83
Code Generation	MBPP	LLaMA 65B (0-shot)	Accuracy	37.7	# 72
Code Generation	MBPP	LLaMA 33B (0-shot)	Accuracy	30.2	# 78
Code Generation	MBPP	LLaMA 7B (0-shot)	Accuracy	17.7	# 86
Multi-task Language Understanding	MMLU	LLaMA 65B (fine-tuned)	Average (%)	68.9	# 35
Multi-task Language Understanding	MMLU	LLaMA 33B (5-shot)	Average (%)	57.8	# 53
Multi-task Language Understanding	MMLU	LLaMA 65B (5-shot)	Average (%)	63.4	# 44
Question Answering	Natural Questions	LLaMA 65B (few-shot, k=5)	EM	35.0	# 22
Question Answering	Natural Questions	LLaMA 65B (one-shot)	EM	31.0	# 26
Question Answering	Natural Questions	LLaMA 65B (few-shot, k=64)	EM	39.9	# 18
Question Answering	Natural Questions	LLaMA 33B (zero-shot)	EM	24.9	# 34
Question Answering	OBQA	LLaMA 13B (zero-shot)	Accuracy	56.4	# 7
Question Answering	OBQA	LLaMA 33B (zero-shot)	Accuracy	58.6	# 4
Question Answering	OBQA	LLaMA 65B (zero-shot)	Accuracy	60.2	# 3
Question Answering	OBQA	LLaMA 7B (zero-shot)	Accuracy	57.2	# 6
Question Answering	PIQA	LLaMA 7B (0-shot)	Accuracy	79.8	# 29
Question Answering	PIQA	LLaMA 13B (0-shot)	Accuracy	80.1	# 28
Question Answering	PIQA	LLaMA 33B (0-shot)	Accuracy	82.3	# 15
Question Answering	PIQA	LLaMA 65B (0-shot)	Accuracy	82.8	# 12
Reading Comprehension	RACE	LLaMA 33B (zero-shot)	Accuracy (High)	48.3	# 9
Reading Comprehension	RACE	LLaMA 33B (zero-shot)	Accuracy (Middle)	64.1	# 10
Reading Comprehension	RACE	LLaMA 65B (zero-shot)	Accuracy (High)	51.6	# 7
Reading Comprehension	RACE	LLaMA 65B (zero-shot)	Accuracy (Middle)	67.9	# 8
Reading Comprehension	RACE	LLaMA 7B (zero-shot)	Accuracy (High)	46.9	# 12
Reading Comprehension	RACE	LLaMA 7B (zero-shot)	Accuracy (Middle)	61.1	# 12
Reading Comprehension	RACE	LLaMA 13B (zero-shot)	Accuracy (High)	47.2	# 11
Reading Comprehension	RACE	LLaMA 13B (zero-shot)	Accuracy (Middle)	61.6	# 11
Question Answering	SIQA	LLaMA 33B (zero-shot)	Accuracy	50.4	# 17
Question Answering	SIQA	LLaMA 13B (zero-shot)	Accuracy	50.4	# 17
Question Answering	SIQA	LLaMA 7B (zero-shot)	Accuracy	48.9	# 19
Question Answering	SIQA	LLaMA 65B (zero-shot)	Accuracy	52.3	# 14
Question Answering	TriviaQA	LLaMA 65B (few-shot, k=64)	EM	73.0	# 16
Question Answering	TriviaQA	LLaMA 65B (zero-shot)	EM	68.2	# 25
Question Answering	TriviaQA	LLaMA 65B (one-shot)	EM	71.6	# 20
Question Answering	TriviaQA	LLaMA 65B (few-shot, k=5)	EM	72.6	# 17
Question Answering	TruthfulQA	LLaMA 65B	% true	57	# 3
Question Answering	TruthfulQA	LLaMA 65B	% info	53	# 8
Question Answering	TruthfulQA	LLaMA 7B	% true	33	# 8
Question Answering	TruthfulQA	LLaMA 7B	% info	29	# 11
Question Answering	TruthfulQA	LLaMA 13B	% true	47	# 6
Question Answering	TruthfulQA	LLaMA 13B	% info	41	# 10
Question Answering	TruthfulQA	LLaMA 33B	% true	52	# 5
Question Answering	TruthfulQA	LLaMA 33B	% info	48	# 9
Common Sense Reasoning	WinoGrande	LLaMA 13B (0-shot)	Accuracy	73.0	# 29
Common Sense Reasoning	WinoGrande	LLaMA 65B (0-shot)	Accuracy	77.0	# 18
Common Sense Reasoning	WinoGrande	LLaMA 33B (0-shot)	Accuracy	76.0	# 21
Common Sense Reasoning	WinoGrande	LLaMA 7B (0-shot)	Accuracy	70.1	# 36

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llama-open-and-efficient-foundation-language-1/question-answering-on-obqa)](https://paperswithcode.com/sota/question-answering-on-obqa?p=llama-open-and-efficient-foundation-language-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llama-open-and-efficient-foundation-language-1/question-answering-on-truthfulqa)](https://paperswithcode.com/sota/question-answering-on-truthfulqa?p=llama-open-and-efficient-foundation-language-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llama-open-and-efficient-foundation-language-1/stereotypical-bias-analysis-on-crows-pairs)](https://paperswithcode.com/sota/stereotypical-bias-analysis-on-crows-pairs?p=llama-open-and-efficient-foundation-language-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llama-open-and-efficient-foundation-language-1/reading-comprehension-on-race)](https://paperswithcode.com/sota/reading-comprehension-on-race?p=llama-open-and-efficient-foundation-language-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llama-open-and-efficient-foundation-language-1/common-sense-reasoning-on-arc-easy)](https://paperswithcode.com/sota/common-sense-reasoning-on-arc-easy?p=llama-open-and-efficient-foundation-language-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llama-open-and-efficient-foundation-language-1/question-answering-on-piqa)](https://paperswithcode.com/sota/question-answering-on-piqa?p=llama-open-and-efficient-foundation-language-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llama-open-and-efficient-foundation-language-1/question-answering-on-social-iqa)](https://paperswithcode.com/sota/question-answering-on-social-iqa?p=llama-open-and-efficient-foundation-language-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llama-open-and-efficient-foundation-language-1/question-answering-on-boolq)](https://paperswithcode.com/sota/question-answering-on-boolq?p=llama-open-and-efficient-foundation-language-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llama-open-and-efficient-foundation-language-1/question-answering-on-triviaqa)](https://paperswithcode.com/sota/question-answering-on-triviaqa?p=llama-open-and-efficient-foundation-language-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llama-open-and-efficient-foundation-language-1/question-answering-on-natural-questions)](https://paperswithcode.com/sota/question-answering-on-natural-questions?p=llama-open-and-efficient-foundation-language-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llama-open-and-efficient-foundation-language-1/common-sense-reasoning-on-winogrande)](https://paperswithcode.com/sota/common-sense-reasoning-on-winogrande?p=llama-open-and-efficient-foundation-language-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llama-open-and-efficient-foundation-language-1/common-sense-reasoning-on-arc-challenge)](https://paperswithcode.com/sota/common-sense-reasoning-on-arc-challenge?p=llama-open-and-efficient-foundation-language-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llama-open-and-efficient-foundation-language-1/sentence-completion-on-hellaswag)](https://paperswithcode.com/sota/sentence-completion-on-hellaswag?p=llama-open-and-efficient-foundation-language-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llama-open-and-efficient-foundation-language-1/multi-task-language-understanding-on-mmlu)](https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu?p=llama-open-and-efficient-foundation-language-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llama-open-and-efficient-foundation-language-1/code-generation-on-mbpp)](https://paperswithcode.com/sota/code-generation-on-mbpp?p=llama-open-and-efficient-foundation-language-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llama-open-and-efficient-foundation-language-1/math-word-problem-solving-on-math)](https://paperswithcode.com/sota/math-word-problem-solving-on-math?p=llama-open-and-efficient-foundation-language-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llama-open-and-efficient-foundation-language-1/code-generation-on-humaneval)](https://paperswithcode.com/sota/code-generation-on-humaneval?p=llama-open-and-efficient-foundation-language-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/llama-open-and-efficient-foundation-language-1/arithmetic-reasoning-on-gsm8k)](https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k?p=llama-open-and-efficient-foundation-language-1)`

LLaMA: Open and Efficient Foundation Language Models

arXiv 2023 · Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample ·

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

PDF Abstract arXiv 2023 PDF arXiv 2023 Abstract

Code

Add Remove Mark official

facebookresearch/llama official

53,047

huggingface/transformers

125,059

ggerganov/llama.cpp

56,845

tatsu-lab/stanford_alpaca

28,804

flagalpha/llama2-chinese

10,822

See all 44 implementations

Tasks

Add Remove

Arithmetic Reasoning

Code Generation

Common Sense Reasoning

Math Word Problem Solving

Multi-task Language Understanding

Question Answering

Sentence Completion

Stereotypical Bias Analysis

Datasets

Natural Questions

MMLU

GSM8K

TriviaQA

HumanEval

HellaSwag

BoolQ

MATH

RACE

PIQA

OpenBookQA

WinoGrande

TruthfulQA MBPP Billion Word Benchmark CrowS-Pairs

ARC (AI2 Reasoning Challenge) CCNet

SIQA

Results from the Paper

Edit

Ranked #3 on Question Answering on OBQA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Common Sense Reasoning	ARC (Challenge)	LLaMA 65B (zero-shot)	Accuracy	56.0	# 23	Compare
Common Sense Reasoning	ARC (Challenge)	LLaMA 13B (zero-shot)	Accuracy	52.7	# 26	Compare
Common Sense Reasoning	ARC (Challenge)	LLaMA 33B (zero-shot)	Accuracy	57.8	# 21	Compare
Common Sense Reasoning	ARC (Challenge)	LLaMA 7B (zero-shot)	Accuracy	47.6	# 34	Compare
Common Sense Reasoning	ARC (Easy)	LLaMA 33B (0-shot)	Accuracy	80.0	# 12	Compare
Common Sense Reasoning	ARC (Easy)	LLaMA 65B (0-shot)	Accuracy	78.9	# 16	Compare
Common Sense Reasoning	ARC (Easy)	LLaMA 7B (0-shot)	Accuracy	72.8	# 23	Compare
Common Sense Reasoning	ARC (Easy)	LLaMA 13B (0-shot)	Accuracy	74.8	# 20	Compare
Question Answering	BoolQ	LLaMA 13B (zero-shot)	Accuracy	78.1	# 28	Compare
Question Answering	BoolQ	LLaMA 7B (zero-shot)	Accuracy	76.5	# 30	Compare
Question Answering	BoolQ	LLaMA 65B (0-shot)	Accuracy	85.3	# 16	Compare
Question Answering	BoolQ	LLaMA 33B (0-shot)	Accuracy	83.1	# 22	Compare
Stereotypical Bias Analysis	CrowS-Pairs	LLaMA 65B	Gender	70.6	# 4	Compare
			Religion	70.6	# 4	Compare
			Race/Color	57.0	# 1	Compare
			Sexual Orientation	81.0	# 4	Compare
			Age	70.1	# 4	Compare
			Nationality	64.2	# 4	Compare
			Disability	66.7	# 1	Compare
			Physical Appearance	77.8	# 4	Compare
			Socioeconomic status	71.5	# 2	Compare
			Overall	66.6	# 3	Compare
Arithmetic Reasoning	GSM8K	LLaMA 33B	Accuracy	35.6	# 131	Compare
Arithmetic Reasoning	GSM8K	LLaMA 33B	Parameters (Billion)	33	# 70	Compare
Arithmetic Reasoning	GSM8K	LLaMA 7B (maj1@k)	Accuracy	18.1	# 140	Compare
Arithmetic Reasoning	GSM8K	LLaMA 7B (maj1@k)	Parameters (Billion)	7	# 10	Compare
Arithmetic Reasoning	GSM8K	LLaMA 33B-maj1@k	Accuracy	53.1	# 117	Compare
Arithmetic Reasoning	GSM8K	LLaMA 33B-maj1@k	Parameters (Billion)	33	# 70	Compare
Arithmetic Reasoning	GSM8K	LLaMA 65B	Accuracy	50.9	# 123	Compare
Arithmetic Reasoning	GSM8K	LLaMA 65B	Parameters (Billion)	65	# 83	Compare
Arithmetic Reasoning	GSM8K	LLaMA 13B-maj1@k	Accuracy	29.3	# 135	Compare
Arithmetic Reasoning	GSM8K	LLaMA 13B-maj1@k	Parameters (Billion)	13	# 53	Compare
Arithmetic Reasoning	GSM8K	LLaMA 13B	Accuracy	17.8	# 143	Compare
Arithmetic Reasoning	GSM8K	LLaMA 13B	Parameters (Billion)	13	# 53	Compare
Arithmetic Reasoning	GSM8K	LLaMA 7B	Accuracy	11.0	# 147	Compare
Arithmetic Reasoning	GSM8K	LLaMA 7B	Parameters (Billion)	7	# 10	Compare
Arithmetic Reasoning	GSM8K	LLaMA 65B-maj1@k	Accuracy	69.7	# 94	Compare
Arithmetic Reasoning	GSM8K	LLaMA 65B-maj1@k	Parameters (Billion)	65	# 83	Compare
Sentence Completion	HellaSwag	LLaMA 65B (0-shot)	Accuracy	84.2	# 24	Compare
Sentence Completion	HellaSwag	LLaMA 33B (0-shot)	Accuracy	82.8	# 31	Compare
Sentence Completion	HellaSwag	LLaMA 13B (0-shot)	Accuracy	79.2	# 42	Compare
Sentence Completion	HellaSwag	LLaMA 7B (0-shot)	Accuracy	76.1	# 47	Compare
Code Generation	HumanEval	LLaMA 13B (zero-shot)	Pass@1	15.8	# 109	Compare
Code Generation	HumanEval	LLaMA 7B (zero-shot)	Pass@1	10.5	# 120	Compare
Code Generation	HumanEval	LLaMA 33B (zero-shot)	Pass@1	21.7	# 97	Compare
Code Generation	HumanEval	LLaMA 65B (zero-shot)	Pass@1	23.7	# 91	Compare
Math Word Problem Solving	MATH	LLaMA 65B (maj1@k)	Accuracy	20.5	# 76	Compare
Math Word Problem Solving	MATH	LLaMA 65B (maj1@k)	Parameters (Billions)	65	# 20	Compare
Math Word Problem Solving	MATH	LLaMA 7B	Accuracy	2.9	# 107	Compare
Math Word Problem Solving	MATH	LLaMA 7B	Parameters (Billions)	7	# 58	Compare
Math Word Problem Solving	MATH	LLaMA 65B	Accuracy	10.6	# 90	Compare
Math Word Problem Solving	MATH	LLaMA 65B	Parameters (Billions)	65	# 20	Compare
Math Word Problem Solving	MATH	LLaMA 33B-maj1@k	Accuracy	15.2	# 82	Compare
Math Word Problem Solving	MATH	LLaMA 33B-maj1@k	Parameters (Billions)	33	# 34	Compare
Math Word Problem Solving	MATH	LLaMA 33B	Accuracy	7.1	# 94	Compare
Math Word Problem Solving	MATH	LLaMA 33B	Parameters (Billions)	33	# 34	Compare
Math Word Problem Solving	MATH	LLaMA 13B-maj1@k	Accuracy	8.8	# 91	Compare
Math Word Problem Solving	MATH	LLaMA 13B-maj1@k	Parameters (Billions)	13	# 38	Compare
Math Word Problem Solving	MATH	LLaMA 13B	Accuracy	3.9	# 105	Compare
Math Word Problem Solving	MATH	LLaMA 13B	Parameters (Billions)	13	# 38	Compare
Math Word Problem Solving	MATH	LLaMA 7B-maj1@k	Accuracy	6.9	# 95	Compare
Math Word Problem Solving	MATH	LLaMA 7B-maj1@k	Parameters (Billions)	7	# 58	Compare
Code Generation	MBPP	LLaMA 13B (0-shot)	Accuracy	22	# 83	Compare
Code Generation	MBPP	LLaMA 65B (0-shot)	Accuracy	37.7	# 72	Compare
Code Generation	MBPP	LLaMA 33B (0-shot)	Accuracy	30.2	# 78	Compare
Code Generation	MBPP	LLaMA 7B (0-shot)	Accuracy	17.7	# 86	Compare
Multi-task Language Understanding	MMLU	LLaMA 65B (fine-tuned)	Average (%)	68.9	# 35	Compare
Multi-task Language Understanding	MMLU	LLaMA 33B (5-shot)	Average (%)	57.8	# 53	Compare
Multi-task Language Understanding	MMLU	LLaMA 65B (5-shot)	Average (%)	63.4	# 44	Compare
Question Answering	Natural Questions	LLaMA 65B (few-shot, k=5)	EM	35.0	# 22	Compare
Question Answering	Natural Questions	LLaMA 65B (one-shot)	EM	31.0	# 26	Compare
Question Answering	Natural Questions	LLaMA 65B (few-shot, k=64)	EM	39.9	# 18	Compare
Question Answering	Natural Questions	LLaMA 33B (zero-shot)	EM	24.9	# 34	Compare
Question Answering	OBQA	LLaMA 13B (zero-shot)	Accuracy	56.4	# 7	Compare
Question Answering	OBQA	LLaMA 33B (zero-shot)	Accuracy	58.6	# 4	Compare
Question Answering	OBQA	LLaMA 65B (zero-shot)	Accuracy	60.2	# 3	Compare
Question Answering	OBQA	LLaMA 7B (zero-shot)	Accuracy	57.2	# 6	Compare
Question Answering	PIQA	LLaMA 7B (0-shot)	Accuracy	79.8	# 29	Compare
Question Answering	PIQA	LLaMA 13B (0-shot)	Accuracy	80.1	# 28	Compare
Question Answering	PIQA	LLaMA 33B (0-shot)	Accuracy	82.3	# 15	Compare
Question Answering	PIQA	LLaMA 65B (0-shot)	Accuracy	82.8	# 12	Compare
Reading Comprehension	RACE	LLaMA 33B (zero-shot)	Accuracy (High)	48.3	# 9	Compare
Reading Comprehension	RACE	LLaMA 33B (zero-shot)	Accuracy (Middle)	64.1	# 10	Compare
Reading Comprehension	RACE	LLaMA 65B (zero-shot)	Accuracy (High)	51.6	# 7	Compare
Reading Comprehension	RACE	LLaMA 65B (zero-shot)	Accuracy (Middle)	67.9	# 8	Compare
Reading Comprehension	RACE	LLaMA 7B (zero-shot)	Accuracy (High)	46.9	# 12	Compare
Reading Comprehension	RACE	LLaMA 7B (zero-shot)	Accuracy (Middle)	61.1	# 12	Compare
Reading Comprehension	RACE	LLaMA 13B (zero-shot)	Accuracy (High)	47.2	# 11	Compare
Reading Comprehension	RACE	LLaMA 13B (zero-shot)	Accuracy (Middle)	61.6	# 11	Compare
Question Answering	SIQA	LLaMA 33B (zero-shot)	Accuracy	50.4	# 17	Compare
Question Answering	SIQA	LLaMA 13B (zero-shot)	Accuracy	50.4	# 17	Compare
Question Answering	SIQA	LLaMA 7B (zero-shot)	Accuracy	48.9	# 19	Compare
Question Answering	SIQA	LLaMA 65B (zero-shot)	Accuracy	52.3	# 14	Compare
Question Answering	TriviaQA	LLaMA 65B (few-shot, k=64)	EM	73.0	# 16	Compare
Question Answering	TriviaQA	LLaMA 65B (zero-shot)	EM	68.2	# 25	Compare
Question Answering	TriviaQA	LLaMA 65B (one-shot)	EM	71.6	# 20	Compare
Question Answering	TriviaQA	LLaMA 65B (few-shot, k=5)	EM	72.6	# 17	Compare
Question Answering	TruthfulQA	LLaMA 65B	% true	57	# 3	Compare
Question Answering	TruthfulQA	LLaMA 65B	% info	53	# 8	Compare
Question Answering	TruthfulQA	LLaMA 7B	% true	33	# 8	Compare
Question Answering	TruthfulQA	LLaMA 7B	% info	29	# 11	Compare
Question Answering	TruthfulQA	LLaMA 13B	% true	47	# 6	Compare
Question Answering	TruthfulQA	LLaMA 13B	% info	41	# 10	Compare
Question Answering	TruthfulQA	LLaMA 33B	% true	52	# 5	Compare
Question Answering	TruthfulQA	LLaMA 33B	% info	48	# 9	Compare
Common Sense Reasoning	WinoGrande	LLaMA 13B (0-shot)	Accuracy	73.0	# 29	Compare
Common Sense Reasoning	WinoGrande	LLaMA 65B (0-shot)	Accuracy	77.0	# 18	Compare
Common Sense Reasoning	WinoGrande	LLaMA 33B (0-shot)	Accuracy	76.0	# 21	Compare
Common Sense Reasoning	WinoGrande	LLaMA 7B (0-shot)	Accuracy	70.1	# 36	Compare

Methods

Add Remove

Adam • Attention Dropout • BPE • Cosine Annealing • Dense Connections • Dropout • Fixed Factorized Attention • GELU • GPT-3 • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • LLaMA • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Strided Attention • Weight Decay

Edit Social Preview

LLaMA: Open and Efficient Foundation Language Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove