CommitPack

Introduced by Muennighoff et al. in OctoPack: Instruction Tuning Code Large Language Models

CommitPack is is a 4TB dataset of commits scraped from GitHub repositories that are permissively licensed.

https://huggingface.co/datasets/bigcode/commitpack

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages