SWE-bench-lite

Introduced by Jimenez et al. in SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution.

SWE-bench lite is a subset of SWE-bench, which is curated to make evaluation less costly and more accessible. SWE-bench lite comprises 300 instances that have been sampled to be more self-contained, with a focus on evaluating functional bug fixes.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages