CodeXGLUE is a benchmark dataset and open challenge for code intelligence. It includes a collection of code intelligence tasks and a platform for model evaluation and comparison. CodeXGLUE stands for General Language Understanding Evaluation benchmark for CODE. It includes 14 datasets for 10 diversified code intelligence tasks covering the following scenarios:
162 PAPERS • 15 BENCHMARKS
HumanEval-X is a benchmark for evaluating the multilingual ability of code generative models. It consists of 820 high-quality human-crafted data samples (each with test cases) in Python, C++, Java, JavaScript, and Go, and can be used for various tasks, such as code generation and translation.
21 PAPERS • NO BENCHMARKS YET
xCodeEval is one of the largest executable multilingual multitask benchmarks consisting of 17 programming languages with execution-level parallelism. It features a total of seven tasks involving code understanding, generation, translation, and retrieval, and it employs an execution-based evaluation instead of traditional lexical approaches. It also provides a test-case-based multilingual code execution engine, ExecEval that supports all the programming languages in xCodeEval.
6 PAPERS • NO BENCHMARKS YET
PyTorrent contains 218,814 Python package libraries from PyPI and Anaconda environment. This is because earlier studies have shown that much of the code is redundant and Python packages from these environments are better in quality and are well-documented. PyTorrent enables users (such as data scientists, students, etc.) to build off the shelf machine learning models directly without spending months of effort on large infrastructure.
4 PAPERS • NO BENCHMARKS YET
The dataset consists of source code and LLVM IR pairs generated from accepted and de-duped programming contest solutions. The dataset is divided into language configs and mode splits. The language can be one of C, C++, D, Fortran, Go, Haskell, Nim, Objective-C, Python, Rust and Swift, indicating the source files' languages. The mode split indicates the compilation mode, which can be wither Size_Optimized or Perf_Optimized.
1 PAPER • NO BENCHMARKS YET