TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Code Generation	APPS	code-davinci-002 175B (CodeT)	Introductory Pass@1	47.3%	# 1
Code Generation	APPS	code-davinci-002 175B (CodeT)	Interview Pass@1	14.3%	# 2
Code Generation	APPS	code-davinci-002 175B (CodeT)	Competition Pass@1	6.2%	# 2
Code Generation	APPS	code-davinci-002 175B	Introductory Pass@1	29.3%	# 3
Code Generation	APPS	code-davinci-002 175B	Interview Pass@1	6.4%	# 4
Code Generation	APPS	code-davinci-002 175B	Competition Pass@1	2.5%	# 4
Code Generation	APPS	code-davinci-002 175B	Competition Pass@any	14.5%	# 2
Code Generation	APPS	code-davinci-002 175B	Interview Pass@any	25.4%	# 1
Code Generation	APPS	code-davinci-002 175B	Introductory Pass@any	60.9%	# 1
Code Generation	HumanEval	CodeGen-Mono 16B (CodeT)	Pass@1	36.7	# 66
Code Generation	HumanEval	InCoder 6.7B (CodeT)	Pass@1	20.6	# 98
Code Generation	HumanEval	code-davinci-002 175B (CodeT)	Pass@1	65.8	# 27
Code Generation	HumanEval	code-davinci-001 175B (CodeT)	Pass@1	50.2	# 43
Code Generation	HumanEval	code-cushman-001 12B (CodeT)	Pass@1	44.5	# 54
Code Generation	HumanEval	code-cushman-001 12B (CodeT-Iter)	Pass@1	45.2	# 50
Code Generation	HumanEval	code-davinci-002 175B (CodeT-Iter)	Pass@1	65.2	# 28
Code Generation	MBPP	CodeGen-Mono 16B + CodeT	Accuracy	49.5	# 46
Code Generation	MBPP	InCoder 6.7B + CodeT	Accuracy	34.4	# 74
Code Generation	MBPP	code-davinci-002 175B + CodeT	Accuracy	67.7	# 20
Code Generation	MBPP	code-davinci-001 175B + CodeT	Accuracy	61.9	# 31
Code Generation	MBPP	code-cushman-001 12B (CodeT)	Accuracy	55.4	# 39

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/codet-code-generation-with-generated-tests/code-generation-on-apps)](https://paperswithcode.com/sota/code-generation-on-apps?p=codet-code-generation-with-generated-tests)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/codet-code-generation-with-generated-tests/code-generation-on-mbpp)](https://paperswithcode.com/sota/code-generation-on-mbpp?p=codet-code-generation-with-generated-tests)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/codet-code-generation-with-generated-tests/code-generation-on-humaneval)](https://paperswithcode.com/sota/code-generation-on-humaneval?p=codet-code-generation-with-generated-tests)`

CodeT: Code Generation with Generated Tests

21 Jul 2022 · Bei Chen, Fengji Zhang, Anh Nguyen, Daoguang Zan, Zeqi Lin, Jian-Guang Lou, Weizhu Chen ·

The task of generating code solutions for a given programming problem can benefit from the use of pre-trained language models such as Codex, which can produce multiple diverse samples. However, a major challenge for this task is to select the most appropriate solution from the multiple samples generated by the pre-trained language models. A natural way to evaluate the quality and correctness of a code solution is to run it against a set of test cases, but the manual creation of such test cases is often costly and time-consuming. In this paper, we propose a novel method, CodeT, that leverages the same pre-trained language models to automatically generate test cases for the code samples, thus reducing the human effort and increasing the coverage of the test scenarios. CodeT then executes the code samples using the generated test cases, and performs a dual execution agreement, which considers both the consistency of the outputs against the generated test cases and the agreement of the outputs with other code samples. We conduct comprehensive experiments on four benchmarks, HumanEval, MBPP, APPS and CodeContests, using five different pre-trained language models with varying sizes and capabilities. Our results show that CodeT can significantly improve the performance of code solution selection over previous methods, achieving remarkable and consistent gains across different models and benchmarks. For instance, CodeT improves the pass@1 metric on HumanEval to 65.8%, which represents an absolute improvement of 18.8% over the code-davinci-002 model, and an absolute improvement of more than 20% over the previous state-of-the-art results.

PDF Abstract

Code

Add Remove Mark official

microsoft/codet official

547

Tasks

Add Remove

Code Generation

Datasets

HumanEval MBPP

APPS CodeContests

Results from the Paper

Edit

Ranked #1 on Code Generation on APPS (Introductory Pass@1 metric)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Code Generation	APPS	code-davinci-002 175B (CodeT)	Introductory Pass@1	47.3%	# 1	Compare
			Interview Pass@1	14.3%	# 2	Compare
			Competition Pass@1	6.2%	# 2	Compare
Code Generation	APPS	code-davinci-002 175B	Introductory Pass@1	29.3%	# 3	Compare
			Interview Pass@1	6.4%	# 4	Compare
			Competition Pass@1	2.5%	# 4	Compare
			Competition Pass@any	14.5%	# 2	Compare
			Interview Pass@any	25.4%	# 1	Compare
			Introductory Pass@any	60.9%	# 1	Compare
Code Generation	HumanEval	CodeGen-Mono 16B (CodeT)	Pass@1	36.7	# 66	Compare
Code Generation	HumanEval	InCoder 6.7B (CodeT)	Pass@1	20.6	# 98	Compare
Code Generation	HumanEval	code-davinci-002 175B (CodeT)	Pass@1	65.8	# 27	Compare
Code Generation	HumanEval	code-davinci-001 175B (CodeT)	Pass@1	50.2	# 43	Compare
Code Generation	HumanEval	code-cushman-001 12B (CodeT)	Pass@1	44.5	# 54	Compare
Code Generation	HumanEval	code-cushman-001 12B (CodeT-Iter)	Pass@1	45.2	# 50	Compare
Code Generation	HumanEval	code-davinci-002 175B (CodeT-Iter)	Pass@1	65.2	# 28	Compare
Code Generation	MBPP	CodeGen-Mono 16B + CodeT	Accuracy	49.5	# 46	Compare
Code Generation	MBPP	InCoder 6.7B + CodeT	Accuracy	34.4	# 74	Compare
Code Generation	MBPP	code-davinci-002 175B + CodeT	Accuracy	67.7	# 20	Compare
Code Generation	MBPP	code-davinci-001 175B + CodeT	Accuracy	61.9	# 31	Compare
Code Generation	MBPP	code-cushman-001 12B (CodeT)	Accuracy	55.4	# 39	Compare

Methods

Add Remove

Test

Edit Social Preview

CodeT: Code Generation with Generated Tests

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove