TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Common Sense Reasoning	ARC (Challenge)	phi-1.5-web 1.3B (zero-shot)	Accuracy	44.9	# 36
Common Sense Reasoning	ARC (Easy)	phi-1.5-web 1.3B (0-shot)	Accuracy	76.1	# 17
Code Generation	HumanEval	phi-1.5-web 1.3B	Pass@1	41.4	# 60
Code Generation	MBPP	phi-1.5-web 1.3B	Accuracy	43.5	# 66
Multi-task Language Understanding	MMLU	phi-1.5-web 1.3B	Average (%)	37.9	# 81
Question Answering	PIQA	phi-1.5-web (1.3B)	Accuracy	77	# 37
Question Answering	SIQA	phi-1.5-web 1.3B (zero-shot)	Accuracy	53.0	# 12
Question Answering	SIQA	phi-1.5 1.3B (zero-shot)	Accuracy	52.6	# 13
Common Sense Reasoning	WinoGrande	phi-1.5-web 1.3B (zero-shot)	Accuracy	74.0	# 27

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/textbooks-are-all-you-need-ii-phi-1-5/question-answering-on-social-iqa)](https://paperswithcode.com/sota/question-answering-on-social-iqa?p=textbooks-are-all-you-need-ii-phi-1-5)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/textbooks-are-all-you-need-ii-phi-1-5/common-sense-reasoning-on-arc-easy)](https://paperswithcode.com/sota/common-sense-reasoning-on-arc-easy?p=textbooks-are-all-you-need-ii-phi-1-5)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/textbooks-are-all-you-need-ii-phi-1-5/common-sense-reasoning-on-winogrande)](https://paperswithcode.com/sota/common-sense-reasoning-on-winogrande?p=textbooks-are-all-you-need-ii-phi-1-5)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/textbooks-are-all-you-need-ii-phi-1-5/common-sense-reasoning-on-arc-challenge)](https://paperswithcode.com/sota/common-sense-reasoning-on-arc-challenge?p=textbooks-are-all-you-need-ii-phi-1-5)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/textbooks-are-all-you-need-ii-phi-1-5/question-answering-on-piqa)](https://paperswithcode.com/sota/question-answering-on-piqa?p=textbooks-are-all-you-need-ii-phi-1-5)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/textbooks-are-all-you-need-ii-phi-1-5/code-generation-on-humaneval)](https://paperswithcode.com/sota/code-generation-on-humaneval?p=textbooks-are-all-you-need-ii-phi-1-5)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/textbooks-are-all-you-need-ii-phi-1-5/code-generation-on-mbpp)](https://paperswithcode.com/sota/code-generation-on-mbpp?p=textbooks-are-all-you-need-ii-phi-1-5)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/textbooks-are-all-you-need-ii-phi-1-5/multi-task-language-understanding-on-mmlu)](https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu?p=textbooks-are-all-you-need-ii-phi-1-5)`

Textbooks Are All You Need II: phi-1.5 technical report

11 Sep 2023 · Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar, Yin Tat Lee ·

We continue the investigation into the power of smaller Transformer-based language models as initiated by \textbf{TinyStories} -- a 10 million parameter model that can produce coherent English -- and the follow-up work on \textbf{phi-1}, a 1.3 billion parameter model with Python coding performance close to the state-of-the-art. The latter work proposed to use existing Large Language Models (LLMs) to generate ``textbook quality" data as a way to enhance the learning process compared to traditional web data. We follow the ``Textbooks Are All You Need" approach, focusing this time on common sense reasoning in natural language, and create a new 1.3 billion parameter model named \textbf{phi-1.5}, with performance on natural language tasks comparable to models 5x larger, and surpassing most non-frontier LLMs on more complex reasoning tasks such as grade-school mathematics and basic coding. More generally, \textbf{phi-1.5} exhibits many of the traits of much larger LLMs, both good -- such as the ability to ``think step by step" or perform some rudimentary in-context learning -- and bad, including hallucinations and the potential for toxic and biased generations -- encouragingly though, we are seeing improvement on that front thanks to the absence of web data. We open-source \textbf{phi-1.5} to promote further research on these urgent topics.

PDF Abstract

Code

Add Remove Mark official

knowlab/bi-weekly-paper-presentation

Tasks

Add Remove

Code Generation

Common Sense Reasoning

In-Context Learning

Multi-task Language Understanding

Question Answering

Datasets

SQuAD

MMLU

GSM8K

HumanEval

HellaSwag

BoolQ

PIQA

OpenBookQA

WinoGrande MBPP

ARC (AI2 Reasoning Challenge)

ToxiGen

SIQA

Results from the Paper

Edit

Ranked #12 on Question Answering on SIQA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Common Sense Reasoning	ARC (Challenge)	phi-1.5-web 1.3B (zero-shot)	Accuracy	44.9	# 36	Compare
Common Sense Reasoning	ARC (Easy)	phi-1.5-web 1.3B (0-shot)	Accuracy	76.1	# 17	Compare
Code Generation	HumanEval	phi-1.5-web 1.3B	Pass@1	41.4	# 60	Compare
Code Generation	MBPP	phi-1.5-web 1.3B	Accuracy	43.5	# 66	Compare
Multi-task Language Understanding	MMLU	phi-1.5-web 1.3B	Average (%)	37.9	# 81	Compare
Question Answering	PIQA	phi-1.5-web (1.3B)	Accuracy	77	# 37	Compare
Question Answering	SIQA	phi-1.5-web 1.3B (zero-shot)	Accuracy	53.0	# 12	Compare
Question Answering	SIQA	phi-1.5 1.3B (zero-shot)	Accuracy	52.6	# 13	Compare
Common Sense Reasoning	WinoGrande	phi-1.5-web 1.3B (zero-shot)	Accuracy	74.0	# 27	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Textbooks Are All You Need II: phi-1.5 technical report

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove