Sequential Instructions

Introduced by Kirk et al. in Understanding the Effects of RLHF on LLM Generalisation and Diversity

This is the sequential instructions dataset from Understanding the Effects of RLHF on LLM Generalisation and Diversity. The dataset is in the alpaca_eval format.

For information about how the dataset was generated, see https://github.com/RobertKirk/stanford_alpaca.

The instructions in the dataset generally have a sequence of steps we expect the model to complete all at once. In our work, we found that RLHF models generalise much better to this dataset than SFT models when trained on the AlpacaFarm datasets.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages