A new large dataset with over 100,000 examples consisting of Java classes from online code repositories, and develop a new encoder-decoder architecture that models the interaction between the method documentation and the class environment.
34 PAPERS • 1 BENCHMARK
Pseudocode-to-Code (SPoC) is a program synthesis dataset, containing 18,356 programs with human-authored pseudocode and test cases.
13 PAPERS • 2 BENCHMARKS
SketchGraphs is a dataset of 15 million sketches extracted from real-world CAD models intended to facilitate research in both ML-aided design and geometric program induction. Each sketch is represented as a geometric constraint graph where edges denote designer-imposed geometric relationships between primitives, the nodes of the graph.
11 PAPERS • NO BENCHMARKS YET
xCodeEval is one of the largest executable multilingual multitask benchmarks consisting of 17 programming languages with execution-level parallelism. It features a total of seven tasks involving code understanding, generation, translation, and retrieval, and it employs an execution-based evaluation instead of traditional lexical approaches. It also provides a test-case-based multilingual code execution engine, ExecEval that supports all the programming languages in xCodeEval.
5 PAPERS • NO BENCHMARKS YET
https://arxiv.org/abs/2106.06086
1 PAPER • NO BENCHMARKS YET