SurgeGlobal/Orca Dataset | Papers With Code

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

## Dataset Generation

-   **Base Model**: h2oai/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2
-   **Seed Instructions**: Derived from the FLAN-v2 Collection.
-   **Generation Approach**: Explanation tuning with detailed responses generated from [h2ogpt-gm-oasst1-en-2048-falcon-40b-v2](https://huggingface.co/h2oai/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2).
-   **Total Instructions**: 5,507 explanation tuning data samples.

### Dataset Sources

-   **Repository:**  [Bitbucket Project](https://bitbucket.org/paladinanalytics/notebooks)
-   **Paper :**  [Pre-Print](https://arxiv.org/abs/2404.12195)

## Structure

The dataset entries consist of:
-   **Query**
-   **Response**
-   **System Message**  (when applicable)

## Usage

The Orca Dataset is intended for fine-tuning language models to not only imitate the style but also the reasoning process of LFMs, thereby improving the safety and quality of the models’ responses.

## Citation

If you find our work useful, please cite our paper as follows:
```
@misc{surge2024openbezoar,
      title={OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data}, 
      author={Chandeepa Dissanayake and Lahiru Lowe and Sachith Gunasekara and Yasiru Ratnayake},
      year={2024},
      eprint={2404.12195},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```

## Dataset Authors

Chandeepa Dissanayake, Lahiru Lowe, Sachith Gunasekara, and Yasiru Ratnayake

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

---

SurgeGlobal/Orca

Dataset Generation

Dataset Sources

Structure

Usage

Citation

Dataset Authors

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

SurgeGlobal/Evol-Instruct

SurgeGlobal/LaMini

Usage

License

Modalities

Languages

SurgeGlobal/Orca

Dataset Generation

Dataset Sources

Structure

Usage

Citation

Dataset Authors

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

SurgeGlobal/Evol-Instruct

SurgeGlobal/LaMini

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages