SurgeGlobal/Evol-Instruct

Introduced by Dissanayake et al. in OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data

Dataset Generation

  • Base Model: h2oai/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2
  • Seed Instructions: Selected from the databricks/databricks-dolly-15k dataset
  • Generation Approach: Iterative evolution of instructions using a conversational syntax for in-depth and in-breadth evolving
  • Total Instructions: 2,304 instruction tuning data samples

Dataset Sources

Structure

The dataset entries consist of: - Instruction - Response - Evolution Strategy (in-depth or in-breadth) - Category (of the original instruction)

Usage

The Evol-Instruct Dataset is designed for the automatic evolution of instruction datasets, enhancing the complexity and diversity of instructions to train language models for a wide range of tasks.

Citation

If you find our work useful, please cite our paper as follows:

@misc{surge2024openbezoar,
      title={OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data}, 
      author={Chandeepa Dissanayake and Lahiru Lowe and Sachith Gunasekara and Yasiru Ratnayake},
      year={2024},
      eprint={2404.12195},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Dataset Authors

Chandeepa Dissanayake, Lahiru Lowe, Sachith Gunasekara, and Yasiru Ratnayake

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Apache 2.0

Modalities


Languages