GINC (Generative IN-Context learning Dataset)

Introduced by Xie et al. in An Explanation of In-context Learning as Implicit Bayesian Inference

GINC (Generative In-Context learning Dataset) is a small-scale synthetic dataset for studying in-context learning. The pretraining data is generated by a mixture of HMMs and the in-context learning prompt examples are also generated from HMMs (either from the mixture or not). The prompt examples are out-of-distribution with respect to the pretraining data since every example is independent, concatenated, and separated by delimiters. The GitHub repository provides code to generate GINC-style datasets of varying vocabulary sizes, number of HMMs, and other parameters.


Paper Code Results Date Stars


Similar Datasets


  • Unknown

