The ManySStuBs4J corpus is a collection of simple fixes to Java bugs, designed for evaluating program repair techniques. We collect all bug-fixing changes using the SZZ heuristic, and then filter these to obtain a data set of small bug fix changes. These are single statement fixes, classified where possible into one of 16 syntactic templates which we call SStuBs. The dataset contains simple statement bugs mined from open-source Java projects hosted in GitHub. There are two variants of the dataset. One mined from the 100 Java Maven Projects and one mined from the top 1000 Java Projects.
The dataest contains 153,652 single statement bugfix changes mined from 1,000 popular open-source Java projects, annotated by whether they match any of a set of 16 bug templates, inspired by state-of-the-art program repair techniques.