CodeContests is a competitive programming dataset for machine-learning. This dataset was used when training AlphaCode.
37 PAPERS • 1 BENCHMARK
PyTorrent contains 218,814 Python package libraries from PyPI and Anaconda environment. This is because earlier studies have shown that much of the code is redundant and Python packages from these environments are better in quality and are well-documented. PyTorrent enables users (such as data scientists, students, etc.) to build off the shelf machine learning models directly without spending months of effort on large infrastructure.
4 PAPERS • NO BENCHMARKS YET
To assess a model’s ability to create microcontroller-driven electronic devices, we developed a benchmark, MICRO25, that includes 25 tasks intended for the common ARDUINO microcontroller ecosystem.. These tasks, shown in Table 2, span 5 core categories including: input, interface protocols, output, sensors, and logic. Each task is either tailored to test a specific fundamental competency required to build basic microcontroller-driven electronic devices, or the integration of several competencies into larger design flows.
1 PAPER • NO BENCHMARKS YET
Syntax-Aware Fill-in-the-Middle (SAFIM) is a benchmark for evaluating Large Language Models (LLMs) on the code Fill-in-the-Middle (FIM) task. SAFIM has three subtasks: Algorithmic Block Completion, Control-Flow Expression Completion, and API Function Call Completion. SAFIM is sourced from code submitted from April 2022 to January 2023 to minimize the impact of data contamination on evaluation results.
1 PAPER • 1 BENCHMARK