Textbooks Are All You Need II: phi-1.5 technical report

11 Sep 2023  ·  Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar, Yin Tat Lee ·

We continue the investigation into the power of smaller Transformer-based language models as initiated by \textbf{TinyStories} -- a 10 million parameter model that can produce coherent English -- and the follow-up work on \textbf{phi-1}, a 1.3 billion parameter model with Python coding performance close to the state-of-the-art. The latter work proposed to use existing Large Language Models (LLMs) to generate ``textbook quality" data as a way to enhance the learning process compared to traditional web data. We follow the ``Textbooks Are All You Need" approach, focusing this time on common sense reasoning in natural language, and create a new 1.3 billion parameter model named \textbf{phi-1.5}, with performance on natural language tasks comparable to models 5x larger, and surpassing most non-frontier LLMs on more complex reasoning tasks such as grade-school mathematics and basic coding. More generally, \textbf{phi-1.5} exhibits many of the traits of much larger LLMs, both good -- such as the ability to ``think step by step" or perform some rudimentary in-context learning -- and bad, including hallucinations and the potential for toxic and biased generations -- encouragingly though, we are seeing improvement on that front thanks to the absence of web data. We open-source \textbf{phi-1.5} to promote further research on these urgent topics.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Common Sense Reasoning ARC (Challenge) phi-1.5-web 1.3B (zero-shot) Accuracy 44.9 # 36
Common Sense Reasoning ARC (Easy) phi-1.5-web 1.3B (0-shot) Accuracy 76.1 # 17
Code Generation HumanEval phi-1.5-web 1.3B Pass@1 41.4 # 60
Code Generation MBPP phi-1.5-web 1.3B Accuracy 43.5 # 66
Multi-task Language Understanding MMLU phi-1.5-web 1.3B Average (%) 37.9 # 81
Question Answering PIQA phi-1.5-web (1.3B) Accuracy 77 # 37
Question Answering SIQA phi-1.5-web 1.3B (zero-shot) Accuracy 53.0 # 12
Question Answering SIQA phi-1.5 1.3B (zero-shot) Accuracy 52.6 # 13
Common Sense Reasoning WinoGrande phi-1.5-web 1.3B (zero-shot) Accuracy 74.0 # 27

Methods


No methods listed for this paper. Add relevant methods here