TFix's Code Patches Data

Introduced by Berabi et al. in TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer

The dataset contains more than 100k code patch pairs extracted from open source projects on GitHub. Each pair comes with the erroneous and the fixed version of the corresponding code snippet. Instead of the whole file, the code snippets are extracted to focus on the problematic region (error line + other lines around it). For each sample, the repository name, the commit id, and the file names are provided so that one can access the complete files in case of interest.

The dataset only has JavaScript programs and the error are detected by the popular static code analyzer ESLint. The dataset can be used in the fields of: program repair, code generation, bug finding, transfer learning and many more fields related to machine learning for code

Homepage