Model Card and Evaluations for Claude Models

Technical Report 2023  ยท  Anthropic ยท

This report includes the model card [1] for Claude models, focusing on Claude 2, along with the results of a range of safety, alignment, and capabilities evaluations. We have been iterating on the training and evaluation of Claude-type models since our first work on Reinforcement Learning from Human Feedback (RLHF) [2]; the newest Claude 2 model represents a continuous evolution from those early and less capable โ€˜helpful and harmlessโ€™ language assistants. This report is not intended to be a scientific paper since most aspects of training and evaluating these models have been documented in our research papers. These include papers on preference modeling [3], reinforcement learning from human feedback for helpful and harmless models [2], red teaming language models [4], measuring representation of subjective global values in language models [5], honesty, (i.e., exploring language modelsโ€™ ability to recognize what they know) [6], evaluating language models with language model-generated tests [7], moral self-correction [8], and Constitutional AI [9]. We also discussed Claudeโ€™s specific constitution in a recent blog post [10]. Our work using human evaluations to test model safety is most thoroughly documented in our paper โ€œRed-Teaming Language Models to Reduce Harmsโ€ [4], while our recent work on automated safety evaluation is โ€œDiscovering Language Model Behaviors with Model-Written Evaluationsโ€ [7]. This report is also not comprehensive โ€“ we expect to release new findings as we continue our research and evaluations of frontier models. However, we hope it provides useful insight into Claude 2โ€™s capabilities and limitations.

PDF

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Common Sense Reasoning ARC (Challenge) Claude Instant 1.1 (few-shot, k=5) Accuracy 85.7 # 11
Common Sense Reasoning ARC (Challenge) Claude 1.3 (few-shot, k=5) Accuracy 90 # 4
Common Sense Reasoning ARC (Challenge) Claude 2 (few-shot, k=5) Accuracy 91 # 3
Arithmetic Reasoning GSM8K Claude Instant 1.1 (0-shot chain-of-thought) Accuracy 80.9 # 59
Arithmetic Reasoning GSM8K Claude 1.3 (0-shot chain-of-thought) Accuracy 85.2 # 39
Arithmetic Reasoning GSM8K Claude 2 (0-shot chain-of-thought) Accuracy 88 # 28
Code Generation HumanEval Claude Instant 1.1 Pass@1 52.8 # 41
Code Generation HumanEval Claude 1.3 Pass@1 56 # 38
Code Generation HumanEval Claude 2 Pass@1 71.2 # 22
Multi-task Language Understanding MMLU Claude Instant 1.1 (5-shot) Average (%) 73.4 # 22
Multi-task Language Understanding MMLU Claude 1.3 (5-shot) Average (%) 77 # 14
Multi-task Language Understanding MMLU Claude 2 (5-shot) Average (%) 78.5 # 11
Question Answering QuALITY Claude 2 (5-shot) Accuracy 83.2 # 2
Question Answering QuALITY Claude 1.3 (5-shot) Accuracy 84.1 # 1
Question Answering QuALITY Claude Instant 1.1 (5-shot) Accuracy 80.5 # 4
Bug fixing SWE-bench Claude 2 Resolved (unassisted) 1.96% # 1
Resolved (assisted) 4.80% # 1
Question Answering TriviaQA Claude Instant 1.1 (few-shot, k=5) EM 78.9 # 9
Question Answering TriviaQA Claude 1.3 (few-shot, k=5) EM 86.7 # 3
Question Answering TriviaQA Claude 2 (few-shot, k=5) EM 87.5 # 1

Methods


No methods listed for this paper. Add relevant methods here