RepoEval

Introduced by Zhang et al. in RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation

RepoEval is a benchmark specifically designed for evaluating repository-level code auto-completion systems. While existing benchmarks mainly focus on single-file tasks, RepoEval addresses the assessment gap for more complex, real-world, multi-file programming scenarios. Here are the key details about RepoEval:

Tasks:
- RepoBench-R (Retrieval): Measures the system's ability to retrieve the most relevant code snippets from other files as cross-file context.
- RepoBench-C (Code Completion): Evaluates the system's capability to predict the next line of code with cross-file and in-file context.
- RepoBench-P (Pipeline): Handles complex tasks that require a combination of both retrieval and next-line prediction¹.
Languages Supported:
- RepoEval supports both Python and Java¹.
Purpose:
- RepoEval aims to facilitate a more complete comparison of performance and encourage continuous improvement in auto-completion systems¹.
Availability:
- RepoEval is publicly available for use here ¹.

In summary, RepoEval provides a comprehensive evaluation framework for assessing the effectiveness of repository-level code auto-completion systems, enabling researchers and developers to enhance code productivity and quality.

(1) [2306.03091] RepoBench: Benchmarking Repository-Level Code Auto .... https://arxiv.org/abs/2306.03091. (2) [2303.12570] RepoCoder: Repository-Level Code Completion Through .... https://arxiv.org/abs/2303.12570. (3) [2306.03091] RepoBench: Benchmarking Repository-Level Code Auto .... https://ar5iv.labs.arxiv.org/html/2306.03091. (4) GitHub - Leolty/repobench: RepoBench: Benchmarking Repository-Level .... https://github.com/Leolty/repobench. (5) undefined. https://doi.org/10.48550/arXiv.2306.03091.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

RepoEval

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

The Stack

SWE-bench

Usage

License

Modalities

Languages

RepoEval

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

The Stack

SWE-bench

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages