Datasets > Modality > Texts > JuICe (JuICe Dataset)

JuICe is a corpus of 1.5 million examples with a curated test set of 3.7K instances based on online programming assignments. Compared with existing contextual code generation datasets, JuICe provides refined human-curated data, open-domain code, and an order of magnitude more training data.

Source: JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation

Samples

License

  • Unknown

Modalities

Languages

Tasks

Similar Datasets