WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated exceptional performance in code-related tasks. However, most existing models are solely pre-trained on extensive raw code data without instruction fine-tuning. In this paper, we introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code. Through comprehensive experiments on four prominent code generation benchmarks, namely HumanEval, HumanEval+, MBPP, and DS-1000, we unveil the exceptional capabilities of our model. It surpasses all other open-source Code LLMs by a substantial margin. Moreover, our model even outperforms the largest closed LLMs, Anthropic's Claude and Google's Bard, on HumanEval and HumanEval+. Our code, model weights, and data are public at https://github.com/nlpxucan/WizardLM
PDF AbstractCode
Tasks
Datasets
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Code Generation | CodeContests | WizardCoder-15B | Test Set pass@1 | 1.11 | # 4 | |
Test Set pass@5 | 3.18 | # 4 | ||||
Val Set pass@1 | 1.98 | # 4 | ||||
Val Set pass@5 | 3.27 | # 3 | ||||
Code Generation | HumanEval | WizardCoder 15B | Pass@1 | 57.30 | # 41 | |
Code Generation | MBPP | WizardCoder 15B | Accuracy | 51.8 | # 49 |