Texts

SurgeGlobal/Evol-Instruct

Introduced by Dissanayake et al. in OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data

Dataset Generation

Base Model: h2oai/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2
Seed Instructions: Selected from the databricks/databricks-dolly-15k dataset
Generation Approach: Iterative evolution of instructions using a conversational syntax for in-depth and in-breadth evolving
Total Instructions: 2,304 instruction tuning data samples

Dataset Sources

Repository: Bitbucket Project
Paper: Pre-Print

Structure

The dataset entries consist of: - Instruction - Response - Evolution Strategy (in-depth or in-breadth) - Category (of the original instruction)

Usage

The Evol-Instruct Dataset is designed for the automatic evolution of instruction datasets, enhancing the complexity and diversity of instructions to train language models for a wide range of tasks.

Citation

If you find our work useful, please cite our paper as follows:

@misc{surge2024openbezoar,
      title={OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data}, 
      author={Chandeepa Dissanayake and Lahiru Lowe and Sachith Gunasekara and Yasiru Ratnayake},
      year={2024},
      eprint={2404.12195},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Dataset Authors

Chandeepa Dissanayake, Lahiru Lowe, Sachith Gunasekara, and Yasiru Ratnayake

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Instruction Following

Similar Datasets

SurgeGlobal/Orca

SurgeGlobal/LaMini

Usage

License

Apache 2.0

Modalities

Texts

Languages

English

SurgeGlobal/Evol-Instruct

Dataset Generation

Dataset Sources

Structure

Usage

Citation

Dataset Authors

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

SurgeGlobal/Orca

SurgeGlobal/LaMini

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages