LongForm

Introduced by Köksal et al. in LongForm: Effective Instruction Tuning with Reverse Instructions

LongForm dataset is created by leveraging English corpus examples with augmented instructions. It contains diverse set of human-written documents from existing corpora such as C4 and Wikipedia and generate instructions for the given documents via LLMs. The examples generated from raw text corpora via LLMs includes structured corpus examples, as well as various NLP task examples such as email writing, grammar error correction, story/poem generation, and text summarization.

Source: LongForm: Optimizing Instruction Tuning for Long Text Generation with Corpus Extraction

Homepage