Claude 3.5 Sonnet Model Card Addendum
This addendum to our Claude 3 Model Card describes Claude 3.5 Sonnet, a new model which outperforms our previous most capable model, Claude 3 Opus, while operating faster and at a lower cost. Claude 3.5 Sonnet offers improved capabilities, including better coding and visual processing. Since it is an evolution of the Claude 3 model family, we are providing an addendum rather than a new model card. We provide updated key evaluations and results from our safety testing.
PDFResults from the Paper
Ranked #1 on
Multi-task Language Understanding
on MMLU
(using extra training data)
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Uses Extra Training Data |
Benchmark |
---|---|---|---|---|---|---|---|
Code Generation | HumanEval | GPT-4o (0-shot) | Pass@1 | 90.2 | # 14 | ||
Multi-task Language Understanding | MMLU | Claude 3.5 Sonnet (5-shot) | Average (%) | 88.7 | # 1 | ||
Visual Question Answering | MM-Vet | Claude 3.5 Sonnet (claude-3-5-sonnet-20240620) | GPT-4 score | 74.2±0.2 | # 5 | ||
Visual Question Answering | MM-Vet v2 | Claude 3.5 Sonnet (claude-3-5-sonnet-20240620) | GPT-4 score | 71.8±0.2 | # 3 | ||
MMR total | MRR-Benchmark | Claude 3.5 Sonnet | Total Column Score | 463 | # 1 |