DS-1000

Introduced by Lai et al. in DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

DS-1000 is a code generation benchmark with a thousand data science questions spanning seven Python libraries that (1) reflects diverse, realistic, and practical use cases, (2) has a reliable metric, (3) defends against memorization by perturbing questions.

Source: DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

Homepage