Galactica: A Large Language Model for Science

Information overload is a major obstacle to scientific progress. The explosive growth in scientific literature and data has made it ever harder to discover useful insights in a large mass of information. Today scientific knowledge is accessed through search engines, but they are unable to organize scientific knowledge alone. In this paper we introduce Galactica: a large language model that can store, combine and reason about scientific knowledge. We train on a large scientific corpus of papers, reference material, knowledge bases and many other sources. We outperform existing models on a range of scientific tasks. On technical knowledge probes such as LaTeX equations, Galactica outperforms the latest GPT-3 by 68.2% versus 49.0%. Galactica also performs well on reasoning, outperforming Chinchilla on mathematical MMLU by 41.3% to 35.7%, and PaLM 540B on MATH with a score of 20.4% versus 8.8%. It also sets a new state-of-the-art on downstream tasks such as PubMedQA and MedMCQA dev of 77.6% and 52.9%. And despite not being trained on a general corpus, Galactica outperforms BLOOM and OPT-175B on BIG-bench. We believe these results demonstrate the potential for language models as a new interface for science. We open source the model for the benefit of the scientific community.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Knowledge Probing AminoProbe GAL 120B (zero-shot) Accuracy 21 # 1
Knowledge Probing AminoProbe OPT (zero-shot) Accuracy 12 # 7
Knowledge Probing AminoProbe GAL 30B (zero-shot) Accuracy 21 # 1
Knowledge Probing AminoProbe BLOOM (zero-shot) Accuracy 14 # 5
Knowledge Probing AminoProbe GPT-3 (text-davinci-002) (zero-shot) Accuracy 14 # 5
Knowledge Probing AminoProbe GAL 125M (zero-shot) Accuracy 12 # 7
Knowledge Probing AminoProbe GAL 6.7B (zero-shot) Accuracy 17 # 3
Knowledge Probing AminoProbe GAL 1.3B (zero-shot) Accuracy 16 # 4
Common Sense Reasoning ARC (Challenge) BLOOM (few-shot, k=5) Accuracy 32.9 # 44
Common Sense Reasoning ARC (Challenge) OPT (few-shot, k=5) Accuracy 31.1 # 46
Common Sense Reasoning ARC (Challenge) GPT-3 (zero-shot) Accuracy 51.4 # 27
Common Sense Reasoning ARC (Challenge) GAL 120B (zero-shot) Accuracy 67.9 # 15
Common Sense Reasoning ARC (Easy) OPT (5-shot) Accuracy 37.4 # 41
Common Sense Reasoning ARC (Easy) GAL 120B (0-shot) Accuracy 83.8 # 7
Common Sense Reasoning ARC (Easy) GPT-3 (zero-shot) Accuracy 68.8 # 33
Common Sense Reasoning ARC (Easy) BLOOM (5-shot) Accuracy 40.7 # 39
Molecular Property Prediction BACE GAL 1.3B ROC-AUC 57.6 # 15
Molecular Property Prediction BACE GAL 30B ROC-AUC 72.7 # 12
Molecular Property Prediction BACE GAL 125M ROC-AUC 56.1 # 16
Molecular Property Prediction BACE GAL 6.7B ROC-AUC 58.4 # 14
Molecular Property Prediction BACE GAL 120B ROC-AUC 61.7 # 13
Molecular Property Prediction BBBP GAL 1.3B ROC-AUC 60.4 # 14
Molecular Property Prediction BBBP Uni-Mol ROC-AUC 72.9 # 3
Molecular Property Prediction BBBP GAL 30B ROC-AUC 59.6 # 15
Molecular Property Prediction BBBP GAL 120B ROC-AUC 66.1 # 13
Molecular Property Prediction BBBP GAL 6.7B ROC-AUC 53.5 # 16
Molecular Property Prediction BBBP GAL 125M ROC-AUC 39.3 # 17
Word Sense Disambiguation BIG-bench (Anachronisms) OPT 175B Accuracy 49.1 # 3
Word Sense Disambiguation BIG-bench (Anachronisms) GAL 120B (few-shot, k=5) Accuracy 48.7 # 4
Word Sense Disambiguation BIG-bench (Anachronisms) GAL 30B (few-shot, k=5) Accuracy 47.0 # 5
Word Sense Disambiguation BIG-bench (Anachronisms) BLOOM 176B Accuracy 1.3 # 6
Question Answering BioASQ BLOOM (zero-shot) Accuracy 91.4 # 3
Question Answering BioASQ GAL 120B (zero-shot) Accuracy 94.3 # 2
Question Answering BioASQ OPT (zero-shot) Accuracy 81.4 # 6
Knowledge Probing BioLAMA GAL 125M Accuracy 3.1 # 8
Knowledge Probing BioLAMA GPT-3 (text-davinci-002) (zero-shot) Accuracy 8.4 # 2
Knowledge Probing BioLAMA BLOOM (zero-shot) Accuracy 9.7 # 1
Knowledge Probing BioLAMA GAL 1.3B Accuracy 7.2 # 5
Knowledge Probing BioLAMA OPT (zero-shot) Accuracy 7.1 # 6
Knowledge Probing BioLAMA GAL 30B Accuracy 6.9 # 7
Knowledge Probing BioLAMA GAL 6.7B Accuracy 7.9 # 4
Knowledge Probing BioLAMA GAL 120B Accuracy 8.0 # 3
Protein Structure Prediction CASPSeq GAL 125M Validation perplexity 20.62 # 5
Protein Structure Prediction CASPSeq GAL 120B Validation perplexity 17.26 # 1
Protein Structure Prediction CASPSeq GAL 6.7B Validation perplexity 17.29 # 3
Protein Structure Prediction CASPSeq GAL 30B Validation perplexity 17.27 # 2
Protein Structure Prediction CASPSeq GAL 1.3B Validation perplexity 17.58 # 4
Protein Structure Prediction CASPSimSeq GAL 30B Validation perplexity 15.42 # 2
Protein Structure Prediction CASPSimSeq GAL 1.3B Validation perplexity 17.04 # 4
Protein Function Prediction CASPSimSeq GAL 120B ROUGE-L 0.252 # 1
Protein Structure Prediction CASPSimSeq GAL 125M Validation perplexity 19.18 # 5
Protein Annotation CASPSimSeq GAL 125M F1 score 0.105 # 5
Protein Function Prediction CASPSimSeq GAL 1.3B ROUGE-L 0.069 # 4
Protein Function Prediction CASPSimSeq GAL 125M ROUGE-L 0.062 # 5
Protein Function Prediction CASPSimSeq GAL 30B ROUGE-L 0.137 # 2
Protein Function Prediction CASPSimSeq GAL 6.7B ROUGE-L 0.109 # 3
Protein Structure Prediction CASPSimSeq GAL 6.7B Validation perplexity 16.35 # 3
Protein Structure Prediction CASPSimSeq GAL 120B Validation perplexity 12.77 # 1
Protein Annotation CASPSimSeq GAL 1.3B F1 score 0.174 # 4
Protein Annotation CASPSimSeq GAL 6.7B F1 score 0.184 # 3
Protein Annotation CASPSimSeq GAL 120B F1 score 0.219 # 2
Protein Annotation CASPSimSeq GAL 30B F1 score 0.22 # 1
Knowledge Probing Chemical Reactions OPT (zero-shot) Accuracy 12.7 # 7
Knowledge Probing Chemical Reactions GAL 125M Accuracy 0.3 # 8
Knowledge Probing Chemical Reactions GPT-3 (text-davinci-002) (zero-shot) Accuracy 35.1 # 3
Knowledge Probing Chemical Reactions BLOOM (zero-shot) Accuracy 22.4 # 5
Knowledge Probing Chemical Reactions GAL 1.3B Accuracy 14.4 # 6
Knowledge Probing Chemical Reactions GAL 6.7B Accuracy 26.4 # 4
Knowledge Probing Chemical Reactions GAL 120B Accuracy 43.1 # 1
Knowledge Probing Chemical Reactions GAL 30B Accuracy 36.5 # 2
Molecular Property Prediction ClinTox GAL 30B ROC-AUC 82.2 # 9
Molecules (M) 2 # 6
Molecular Property Prediction ClinTox GAL 125M ROC-AUC 51.8 # 17
Molecules (M) 2 # 6
Molecular Property Prediction ClinTox GAL 6.7B ROC-AUC 78.4 # 11
Molecules (M) 2 # 6
Molecular Property Prediction ClinTox GAL 1.3B ROC-AUC 58.9 # 15
Molecules (M) 2 # 6
Molecular Property Prediction ClinTox GAL 120B ROC-AUC 82.6 # 8
Molecules (M) 2 # 6
Citation Prediction Contextual Citations GAL 125M Accuracy 7.1 # 6
Citation Prediction Contextual Citations Dense Retriever (fine-tuned) Accuracy 8.2 # 5
Citation Prediction Contextual Citations GAL 1.3B Accuracy 15.9 # 4
Citation Prediction Contextual Citations GAL 6.7B Accuracy 23 # 3
Citation Prediction Contextual Citations GAL 30B Accuracy 31.5 # 2
Citation Prediction Contextual Citations GAL 120B Accuracy 36.6 # 1
Citation Prediction Contextual Citations Sparse Retriever Accuracy 5.3 # 7
Citation Prediction Contextual Citations Dense Retriever Accuracy 1.6 # 8
Stereotypical Bias Analysis CrowS-Pairs GAL 120B Gender 51.9 # 1
Religion 51.9 # 1
Race/Color 59.9 # 2
Sexual Orientation 77.4 # 2
Age 69 # 3
Nationality 51.6 # 1
Disability 66.7 # 1
Physical Appearance 58.7 # 1
Socioeconomic status 65.7 # 1
Overall 60.5 # 4
Citation Prediction Extended Citations Dense Retriever (fine-tuned) Accuracy 11.8 # 6
Citation Prediction Extended Citations GAL 1.3B Accuracy 45.5 # 4
Citation Prediction Extended Citations GAL 6.7B Accuracy 60 # 3
Citation Prediction Extended Citations GAL 30B Accuracy 66.4 # 2
Citation Prediction Extended Citations GAL 125M Accuracy 6.4 # 8
Citation Prediction Extended Citations GAL 120B Accuracy 69.1 # 1
Citation Prediction Extended Citations Sparse Retriever Accuracy 17.3 # 5
Citation Prediction Extended Citations Dense Retriever Accuracy 8.8 # 7
Knowledge Probing Galaxy Clusters GAL 125M Accuracy 6.7 # 8
Knowledge Probing Galaxy Clusters GAL 120B Accuracy 24.2 # 1
Knowledge Probing Galaxy Clusters GAL 6.7B Accuracy 17.5 # 5
Knowledge Probing Galaxy Clusters OPT (zero-shot) Accuracy 21.7 # 2
Knowledge Probing Galaxy Clusters GAL 1.3B Accuracy 14.2 # 7
Knowledge Probing Galaxy Clusters BLOOM (zero-shot) Accuracy 15 # 6
Knowledge Probing Galaxy Clusters GPT-3 (text-davinci-002) (zero-shot) Accuracy 20.8 # 3
Knowledge Probing Galaxy Clusters GAL 30B Accuracy 20 # 4
Molecular Property Prediction HIV dataset GAL 30B AUC 0.759 # 5
Molecular Property Prediction HIV dataset Uni-Mol AUC 0.808 # 2
Molecular Property Prediction HIV dataset GAL 120B AUC 0.745 # 6
Molecular Property Prediction HIV dataset GAL 6.7B AUC 0.722 # 8
Molecular Property Prediction HIV dataset GAL 1.3B AUC 0.724 # 7
Molecular Property Prediction HIV dataset GAL 125M AUC 0.702 # 9
IUPAC Name Prediction IUPAC GAL 125M Accuracy 0 # 5
IUPAC Name Prediction IUPAC GAL 6.7B Accuracy 10.7 # 3
IUPAC Name Prediction IUPAC GAL 30B Accuracy 15.4 # 2
IUPAC Name Prediction IUPAC GAL 1.3B Accuracy 2.5 # 4
IUPAC Name Prediction IUPAC GAL 120B Accuracy 39.2 # 1
Knowledge Probing Latex Equations GAL 120B (zero-shot) Accuracy 68.2 # 1
Knowledge Probing Latex Equations OPT (zero-shot) Accuracy 8.9 # 7
Knowledge Probing Latex Equations BLOOM (zero-shot) Accuracy 21.4 # 5
Knowledge Probing Latex Equations GPT-3 (text-davinci-002) (zero-shot) Accuracy 49 # 3
Knowledge Probing Latex Equations GAL 125M (zero-shot) Accuracy 0.5 # 8
Knowledge Probing Latex Equations GAL 1.3B (zero-shot) Accuracy 20.5 # 6
Knowledge Probing Latex Equations GAL 6.7B (zero-shot) Accuracy 41.7 # 4
Knowledge Probing Latex Equations GAL 30B (zero-shot) Accuracy 51.5 # 2
Math Word Problem Solving MATH GAL 120B <work> Accuracy 16.6 # 81
Parameters (Billions) 120 # 8
Math Word Problem Solving MATH PaLM 540B (5-shot) mCoT Accuracy 8.8 # 91
Parameters (Billions) 540 # 1
Math Word Problem Solving MATH GAL 30B <work> Accuracy 11.4 # 88
Parameters (Billions) 30 # 36
Math Word Problem Solving MATH GPT-3 175B (8-shot) Accuracy 5.2 # 102
Parameters (Billions) 175 # 5
Math Word Problem Solving MATH GAL 30B (5-shot) mCoT Accuracy 12.7 # 86
Parameters (Billions) 30 # 36
Math Word Problem Solving MATH Minerva 540B (5-shot) mCoT Accuracy 33.6 # 56
Parameters (Billions) 540 # 1
Math Word Problem Solving MATH GAL 120B (5-shot) mCoT Accuracy 20.4 # 77
Parameters (Billions) 120 # 8
Multiple Choice Question Answering (MCQA) MedMCQA GAL 120B (zero-shot) Dev Set (Acc-%) 0.529 # 8
Multiple Choice Question Answering (MCQA) MedMCQA BLOOM (few-shot, k=5) Dev Set (Acc-%) 0.325 # 16
Multiple Choice Question Answering (MCQA) MedMCQA OPT (few-shot, k=5) Dev Set (Acc-%) 0.296 # 17
Question Answering MedQA GAL 120B (zero-shot) Accuracy 44.4 # 15
Question Answering MedQA BLOOM (few-shot, k=5) Accuracy 23.3 # 21
Question Answering MedQA OPT (few-shot, k=5) Accuracy 22.8 # 22
Knowledge Probing Mineral Groups GAL 1.3B Accuracy 10.3 # 4
Knowledge Probing Mineral Groups OPT (zero-shot) Accuracy 1.6 # 7
Knowledge Probing Mineral Groups GAL 6.7B Accuracy 8.7 # 6
Knowledge Probing Mineral Groups GAL 30B Accuracy 17.5 # 3
Knowledge Probing Mineral Groups BLOOM (zero-shot) Accuracy 10.3 # 4
Knowledge Probing Mineral Groups GPT-3 (text-davinci-002) (zero-shot) Accuracy 18.3 # 2
Knowledge Probing Mineral Groups GAL 125M Accuracy 0.0 # 8
Knowledge Probing Mineral Groups GAL 120B Accuracy 29.4 # 1
Multi-task Language Understanding MMLU GAL 120B (zero-shot) Average (%) 52.6 # 62
Multiple Choice Question Answering (MCQA) MMLU (Abstract Algebra) Gopher (few-shot, k=5) Accuracy 25 # 4
Multiple Choice Question Answering (MCQA) MMLU (Abstract Algebra) Chinchilla (few-shot, k=5) Accuracy 31 # 2
Multiple Choice Question Answering (MCQA) MMLU (Abstract Algebra) OPT (few-shot, k=5) Accuracy 21 # 5
Multiple Choice Question Answering (MCQA) MMLU (Abstract Algebra) GAL 120B (zero-shot) Accuracy 27 # 3
Multiple Choice Question Answering (MCQA) MMLU (Abstract Algebra) GAL 30B (zero-shot) Accuracy 33.3 # 1
Multiple Choice Question Answering (MCQA) MMLU (Astronomy) Gopher (few-shot, k=5) Accuracy 65.8 # 2
Multiple Choice Question Answering (MCQA) MMLU (Astronomy) OPT (few-shot, k=5) Accuracy 23.0 # 5
Multiple Choice Question Answering (MCQA) MMLU (Astronomy) BLOOM (few-shot, k=5) Accuracy 25.7 # 4
Multiple Choice Question Answering (MCQA) MMLU (Astronomy) Chinchilla (few-shot, k=5) Accuracy 73.0 # 1
Multiple Choice Question Answering (MCQA) MMLU (Astronomy) GAL 120B (zero-shot) Accuracy 65.1 # 3
Multiple Choice Question Answering (MCQA) MMLU (College Biology) BLOOM (few-shot, k=5) Accuracy 28.5 # 8
Multiple Choice Question Answering (MCQA) MMLU (College Biology) Gopher (few-shot, k=5) Accuracy 70.8 # 5
Multiple Choice Question Answering (MCQA) MMLU (College Biology) GAL 120B (zero-shot) Accuracy 68.8 # 6
Multiple Choice Question Answering (MCQA) MMLU (College Biology) Chinchilla (few-shot, k=5) Accuracy 79.9 # 4
Multiple Choice Question Answering (MCQA) MMLU (College Biology) OPT (few-shot, k=5) Accuracy 30.6 # 7
Multiple Choice Question Answering (MCQA) MMLU (College Chemistry) Chinchilla (few-shot, k=5) Accuracy 51 # 1
Multiple Choice Question Answering (MCQA) MMLU (College Chemistry) Gopher (few-shot, k=5) Accuracy 45 # 3
Multiple Choice Question Answering (MCQA) MMLU (College Chemistry) BLOOM (few-shot, k=5) Accuracy 19 # 5
Multiple Choice Question Answering (MCQA) MMLU (College Chemistry) GAL 120B (zero-shot) Accuracy 46 # 2
Multiple Choice Question Answering (MCQA) MMLU (College Chemistry) OPT (few-shot, k=5) Accuracy 30 # 4
Multiple Choice Question Answering (MCQA) MMLU (College Computer Science) OPT (few-shot, k=5) Accuracy 17.0 # 3
Multiple Choice Question Answering (MCQA) MMLU (College Computer Science) BLOOM (few-shot, k=5) Accuracy 6.0 # 4
Multiple Choice Question Answering (MCQA) MMLU (College Computer Science) Chinchilla (few-shot, k=5) Accuracy 51.0 # 1
Multiple Choice Question Answering (MCQA) MMLU (College Computer Science) GAL 120B (zero-shot) Accuracy 49 # 2
Multiple Choice Question Answering (MCQA) MMLU (College Mathematics) OPT (few-shot, k=5) Accuracy 33 # 3
Multiple Choice Question Answering (MCQA) MMLU (College Mathematics) BLOOM (few-shot, k=5) Accuracy 25 # 5
Multiple Choice Question Answering (MCQA) MMLU (College Mathematics) GAL 120B (zero-shot) Accuracy 43 # 1
Multiple Choice Question Answering (MCQA) MMLU (College Mathematics) Chinchilla (few-shot, k=5) Accuracy 32 # 4
Multiple Choice Question Answering (MCQA) MMLU (College Mathematics) Gopher (few-shot, k=5) Accuracy 37 # 2
Multiple Choice Question Answering (MCQA) MMLU (College Physics) OPT (few-shot, k=5) Accuracy 21.6 # 4
Multiple Choice Question Answering (MCQA) MMLU (College Physics) GAL 120B (zero-shot) Accuracy 42.2 # 2
Multiple Choice Question Answering (MCQA) MMLU (College Physics) Chinchilla (few-shot, k=5) Accuracy 46.1 # 1
Multiple Choice Question Answering (MCQA) MMLU (College Physics) BLOOM (few-shot, k=5) Accuracy 18.6 # 5
Multiple Choice Question Answering (MCQA) MMLU (College Physics) Gopher (few-shot, k=5) Accuracy 34.3 # 3
Multiple Choice Question Answering (MCQA) MMLU (Econometrics) Chinchilla (few-shot, k=5) Accuracy 38.6 # 3
Multiple Choice Question Answering (MCQA) MMLU (Econometrics) Gopher (few-shot, k=5) Accuracy 43 # 1
Multiple Choice Question Answering (MCQA) MMLU (Econometrics) GAL 120B (zero-shot) Accuracy 42.1 # 2
Multiple Choice Question Answering (MCQA) MMLU (Econometrics) BLOOM (few-shot, k=5) Accuracy 23.7 # 4
Multiple Choice Question Answering (MCQA) MMLU (Econometrics) OPT (few-shot, k=5) Accuracy 21 # 5
Multiple Choice Question Answering (MCQA) MMLU (Electrical Engineer) Gopher (few-shot, k=5) Accuracy 60 # 3
Multiple Choice Question Answering (MCQA) MMLU (Electrical Engineer) BLOOM (few-shot, k=5) Accuracy 32.4 # 5
Multiple Choice Question Answering (MCQA) MMLU (Electrical Engineer) Chinchilla (few-shot, k=5) Accuracy 62.1 # 2
Multiple Choice Question Answering (MCQA) MMLU (Electrical Engineer) GAL 120B (zero-shot) Accuracy 62.8 # 1
Multiple Choice Question Answering (MCQA) MMLU (Electrical Engineer) OPT (few-shot, k=5) Accuracy 36.6 # 4
Multiple Choice Question Answering (MCQA) MMLU (Elementary Mathematics) Gopher (few-shot, k=5) Accuracy 33.6 # 3
Multiple Choice Question Answering (MCQA) MMLU (Elementary Mathematics) BLOOM (few-shot, k=5) Accuracy 27.6 # 4
Multiple Choice Question Answering (MCQA) MMLU (Elementary Mathematics) Chinchilla (few-shot, k=5) Accuracy 41.5 # 1
Multiple Choice Question Answering (MCQA) MMLU (Elementary Mathematics) OPT (few-shot, k=5) Accuracy 25.7 # 5
Multiple Choice Question Answering (MCQA) MMLU (Elementary Mathematics) GAL 120B (zero-shot) Accuracy 38.1 # 2
Multiple Choice Question Answering (MCQA) MMLU (Formal Logic) BLOOM (few-shot, k=5) Accuracy 26.2 # 5
Multiple Choice Question Answering (MCQA) MMLU (Formal Logic) Gopher (few-shot, k=5) Accuracy 35.7 # 1
Multiple Choice Question Answering (MCQA) MMLU (Formal Logic) GAL 120B (zero-shot) Accuracy 32.5 # 3
Multiple Choice Question Answering (MCQA) MMLU (Formal Logic) OPT (few-shot, k=5) Accuracy 29.4 # 4
Multiple Choice Question Answering (MCQA) MMLU (Formal Logic) Chinchilla (few-shot, k=5) Accuracy 33.3 # 2
Multiple Choice Question Answering (MCQA) MMLU (High School Biology) OPT (few-shot, k=5) Accuracy 27.7 # 5
Multiple Choice Question Answering (MCQA) MMLU (High School Biology) BLOOM (few-shot, k=5) Accuracy 29.4 # 4
Multiple Choice Question Answering (MCQA) MMLU (High School Biology) Gopher (few-shot, k=5) Accuracy 71.3 # 2
Multiple Choice Question Answering (MCQA) MMLU (High School Biology) Chinchilla (few-shot, k=5) Accuracy 80.3 # 1
Multiple Choice Question Answering (MCQA) MMLU (High School Biology) GAL 120B (zero-shot) Accuracy 69.4 # 3
Multiple Choice Question Answering (MCQA) MMLU (High School Chemistry) OPT (few-shot, k=5) Accuracy 21.7 # 4
Multiple Choice Question Answering (MCQA) MMLU (High School Chemistry) BLOOM (few-shot, k=5) Accuracy 23.2 # 3
Multiple Choice Question Answering (MCQA) MMLU (High School Chemistry) Chinchilla (few-shot, k=5) Accuracy 58.1 # 1
Multiple Choice Question Answering (MCQA) MMLU (High School Chemistry) GAL 120B (zero-shot) Accuracy 47.8 # 2
Multiple Choice Question Answering (MCQA) MMLU (High School Computer Science) GAL 120B (zero-shot) Accuracy 70 # 1
Multiple Choice Question Answering (MCQA) MMLU (High School Computer Science) BLOOM (few-shot, k=5) Accuracy 25 # 5
Multiple Choice Question Answering (MCQA) MMLU (High School Computer Science) Gopher (few-shot, k=5) Accuracy 54 # 3
Multiple Choice Question Answering (MCQA) MMLU (High School Computer Science) OPT (few-shot, k=5) Accuracy 30 # 4
Multiple Choice Question Answering (MCQA) MMLU (High School Computer Science) Chinchilla (few-shot, k=5) Accuracy 58 # 2
Multiple Choice Question Answering (MCQA) MMLU (High School Mathematics) Chinchilla (few-shot, k=5) Accuracy 31.9 # 2
Multiple Choice Question Answering (MCQA) MMLU (High School Mathematics) GAL 120B (zero-shot) Accuracy 32.6 # 1
Multiple Choice Question Answering (MCQA) MMLU (High School Mathematics) Gopher (few-shot, k=5) Accuracy 23.7 # 5
Multiple Choice Question Answering (MCQA) MMLU (High School Mathematics) BLOOM (few-shot, k=5) Accuracy 27 # 3
Multiple Choice Question Answering (MCQA) MMLU (High School Mathematics) OPT (few-shot, k=5) Accuracy 24.4 # 4
Multiple Choice Question Answering (MCQA) MMLU (High School Physics) OPT (few-shot, k=5) Accuracy 29.8 # 3
Multiple Choice Question Answering (MCQA) MMLU (High School Physics) Chinchilla (few-shot, k=5) Accuracy 36.4 # 1
Multiple Choice Question Answering (MCQA) MMLU (High School Physics) BLOOM (few-shot, k=5) Accuracy 25.2 # 4
Multiple Choice Question Answering (MCQA) MMLU (High School Physics) GAL 120B (zero-shot) Accuracy 33.8 # 2
Multiple Choice Question Answering (MCQA) MMLU (High School Statistics) Chinchilla (few-shot, k=5) Accuracy 58.8 # 1
Multiple Choice Question Answering (MCQA) MMLU (High School Statistics) BLOOM (few-shot, k=5) Accuracy 19.4 # 5
Multiple Choice Question Answering (MCQA) MMLU (High School Statistics) GAL 120B (zero-shot) Accuracy 41.2 # 4
Multiple Choice Question Answering (MCQA) MMLU (High School Statistics) OPT (few-shot, k=5) Accuracy 43.5 # 3
Multiple Choice Question Answering (MCQA) MMLU (High School Statistics) Gopher (few-shot, k=5) Accuracy 50 # 2
Multiple Choice Question Answering (MCQA) MMLU (Machine Learning) Chinchilla (few-shot, k=5) Accuracy 41.1 # 1
Multiple Choice Question Answering (MCQA) MMLU (Machine Learning) GAL 120B (zero-shot) Accuracy 38.4 # 2
Multiple Choice Question Answering (MCQA) MMLU (Machine Learning) BLOOM (few-shot, k=5) Accuracy 25 # 4
Multiple Choice Question Answering (MCQA) MMLU (Machine Learning) OPT (few-shot, k=5) Accuracy 28.6 # 3
Mathematical Reasoning MMLU (Mathematics) GAL 1.3B Accuracy 27.1 # 10
Mathematical Reasoning MMLU (Mathematics) Chinchilla (5-shot) Accuracy 35.7 # 4
Mathematical Reasoning MMLU (Mathematics) GAL 30B <work> Accuracy 37.1 # 2
Mathematical Reasoning MMLU (Mathematics) GAL 6.7B <work> Accuracy 28 # 9
Mathematical Reasoning MMLU (Mathematics) GAL 1.3B <work> Accuracy 24.6 # 13
Mathematical Reasoning MMLU (Mathematics) GAL 120B Accuracy 35.8 # 3
Mathematical Reasoning MMLU (Mathematics) GAL 30B Accuracy 29.9 # 7
Mathematical Reasoning MMLU (Mathematics) BLOOM (5-shot) Accuracy 26.4 # 12
Mathematical Reasoning MMLU (Mathematics) OPT (5-shot) Accuracy 26.7 # 11
Mathematical Reasoning MMLU (Mathematics) GAL 6.7B Accuracy 29.2 # 8
Mathematical Reasoning MMLU (Mathematics) GAL 120B <work> Accuracy 41.3 # 1
Mathematical Reasoning MMLU (Mathematics) Gopher (5-shot) Accuracy 30.6 # 6
Multiple Choice Question Answering (MCQA) MMLU (Medical Genetics) BLOOM (few-shot, k=5) Accuracy 36 # 7
Multiple Choice Question Answering (MCQA) MMLU (Medical Genetics) Chinchilla (few-shot, k=5) Accuracy 69 # 5
Multiple Choice Question Answering (MCQA) MMLU (Medical Genetics) GAL 30B (zero-shot) Accuracy 70 # 4
Multiple Choice Question Answering (MCQA) MMLU (Medical Genetics) GAL 120B (zero-shot) Accuracy 68 # 6
Multiple Choice Question Answering (MCQA) MMLU (Medical Genetics) OPT (few-shot, k=5) Accuracy 35 # 8
Molecular Property Prediction MoleculeNet GAL 125M AUC 0.581 # 5
Molecular Property Prediction MoleculeNet GAL 30B AUC 0.69 # 2
Molecular Property Prediction MoleculeNet GAL 1.3B AUC 0.619 # 4
Molecular Property Prediction MoleculeNet Uni-Mol AUC 0.77 # 1
Molecular Property Prediction MoleculeNet GAL 6.7B AUC 0.64 # 3
Protein Function Prediction PaenSeq GAL 120B ROUGE-L 0.272 # 1
Protein Structure Prediction PaenSeq GAL 6.7B Validation perplexity 7.76 # 3
Protein Function Prediction PaenSeq GAL 125M ROUGE-L 0.073 # 5
Protein Function Prediction PaenSeq GAL 1.3B ROUGE-L 0.084 # 4
Protein Structure Prediction PaenSeq GAL 30B Validation perplexity 4.28 # 2
Protein Function Prediction PaenSeq GAL 6.7B ROUGE-L 0.137 # 3
Protein Structure Prediction PaenSeq GAL 125M Validation perplexity 16.35 # 5
Protein Function Prediction PaenSeq GAL 30B ROUGE-L 0.196 # 2
Protein Annotation PaenSeq GAL 1.3B F1 score 0.26 # 4
Protein Annotation PaenSeq GAL 125M F1 score 0.093 # 5
Protein Structure Prediction PaenSeq GAL 1.3B Validation perplexity 12.53 # 4
Protein Annotation PaenSeq GAL 120B F1 score 0.545 # 1
Protein Annotation PaenSeq GAL 6.7B F1 score 0.333 # 3
Protein Structure Prediction PaenSeq GAL 120B Validation perplexity 3.14 # 1
Protein Annotation PaenSeq GAL 30B F1 score 0.426 # 2
Question Answering PubMedQA GAL 120B (zero-shot) Accuracy 77.6 # 8
Question Answering PubMedQA OPT (zero-shot) Accuracy 70.2 # 18
Question Answering PubMedQA BLOOM (zero-shot) Accuracy 73.6 # 15
Citation Prediction PWC Citations Sparse Retriever Accuracy 30.9 # 4
Citation Prediction PWC Citations Dense Retriever (fine-tuned) Accuracy 27.6 # 5
Citation Prediction PWC Citations Dense Retriever Accuracy 16.4 # 7
Citation Prediction PWC Citations GAL 120B Accuracy 51.9 # 1
Citation Prediction PWC Citations GAL 30B Accuracy 44.7 # 2
Citation Prediction PWC Citations GAL 6.7B Accuracy 32 # 3
Citation Prediction PWC Citations GAL 1.3B Accuracy 18.5 # 6
Citation Prediction PWC Citations GAL 125M Accuracy 7 # 8
Molecular Property Prediction SIDER GAL 6.7B ROC-AUC 55.9 # 15
Molecular Property Prediction SIDER GAL 1.3B ROC-AUC 54.0 # 17
Molecular Property Prediction SIDER GAL 120B ROC-AUC 63.2 # 11
Molecular Property Prediction SIDER GAL 30B ROC-AUC 61.3 # 13
Molecular Property Prediction SIDER GAL 125M ROC-AUC 55.9 # 15
Bias Detection StereoSet GPT-3 (text-davinci-002) ICAT Score 60.8 # 10
LMS 77.6 # 1
SS 60.8 # 1
Bias Detection StereoSet OPT 175B ICAT Score 60 # 11
LMS 74.8 # 3
SS 59.9 # 2
Bias Detection StereoSet GAL 120B ICAT Score 65.6 # 8
LMS 75 # 2
SS 56.2 # 3
Molecular Property Prediction Tox21 GAL 125M ROC-AUC 54.3 # 17
Molecular Property Prediction Tox21 GAL 6.7B ROC-AUC 63.9 # 15
Molecular Property Prediction Tox21 GAL 1.3B ROC-AUC 60.6 # 16
Molecular Property Prediction Tox21 GAL 120B ROC-AUC 68.9 # 13
Molecular Property Prediction Tox21 GAL 30B ROC-AUC 68.5 # 14
Molecular Property Prediction Tox21 Uni-Mol ROC-AUC 79.6 # 3
Question Answering TruthfulQA GAL 125M MC1 0.19 # 20
Question Answering TruthfulQA OPT 175B MC1 0.21 # 17
Question Answering TruthfulQA GAL 6.7B MC1 0.19 # 20
Question Answering TruthfulQA GAL 30B MC1 0.24 # 12
Question Answering TruthfulQA GAL 1.3B MC1 0.19 # 20
Question Answering TruthfulQA GAL 120B MC1 0.26 # 10
Protein Annotation UniProtSeq GAL 1.3B F1 score 0.219 # 4
Protein Annotation UniProtSeq GAL 120B F1 score 0.487 # 1
Protein Annotation UniProtSeq GAL 125M F1 score 0.152 # 5
Protein Structure Prediction UniProtSeq GAL 120B Validation perplexity 5.54 # 1
Protein Annotation UniProtSeq GAL 30B F1 score 0.408 # 2
Protein Structure Prediction UniProtSeq GAL 30B Validation perplexity 8.23 # 2
Protein Structure Prediction UniProtSeq GAL 6.7B Validation perplexity 11.58 # 3
Protein Structure Prediction UniProtSeq GAL 1.3B Validation perplexity 15.82 # 4
Protein Function Prediction UniProtSeq GAL 120B ROUGE-L 0.252 # 1
Protein Structure Prediction UniProtSeq GAL 125M Validation perplexity 19.05 # 5
Protein Function Prediction UniProtSeq GAL 125M ROUGE-L 0.061 # 5
Protein Function Prediction UniProtSeq GAL 1.3B ROUGE-L 0.079 # 4
Protein Function Prediction UniProtSeq GAL 6.7B ROUGE-L 0.111 # 3
Protein Function Prediction UniProtSeq GAL 30B ROUGE-L 0.186 # 2
Protein Annotation UniProtSeq GAL 6.7B F1 score 0.251 # 3

Methods