12 dataset results for Instruction Following AND Texts

IFEval (Instruction Following Evaluation Datset)

This dataset evaluates instruction following ability of large language models. There are 500+ prompts with instructions such as "write an article with more than 800 words", "wrap your response with double quotation marks", etc.

11 PAPERS • 1 BENCHMARK

MIMIC-IT

MultI-Modal In-Context Instruction Tuning (MIMIC-IT) is a dataset for instruction tuning into multi-modal models, motivated by the Flamingo model's upstream interleaved format pretraining dataset. The data sample consists of a queried image-instruction-answer triplet, with the instruction-answer tailored to the image, and context. The context contains a series of image-instruction-answer triplets that contextually correlate with the queried triplet, emulating the relationship between the context and the queried image-text pair found in the MMC4 dataset.

9 PAPERS • NO BENCHMARKS YET

UGIF

UGIF is a multi-lingual, multi-modal UI grounded dataset for step-by-step task completion on the smartphone. It contains 523 natural language instructions with paired sequences of multilingual UI screens and actions that show how to execute the task in eight languages.

7 PAPERS • NO BENCHMARKS YET

Bactrian-X

Bactrian-X is a comprehensive multilingual parallel dataset of 3.4 million instruction-response pairs across 52 languages. The instructions were obtained from alpaca-52k, and dolly-15k, and tranlated into 52 languages (52 languages x 67k instances = 3.4M instances).

4 PAPERS • NO BENCHMARKS YET

Alpaca Data Galician

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 PAPER • NO BENCHMARKS YET

InstructOpenWiki

InstructOpenWiki is a substantial instruction tuning dataset for Open-world IE enriched with a comprehensive corpus, extensive annotations, and diverse instructions.

1 PAPER • NO BENCHMARKS YET

Sequential Instructions

This is the sequential instructions dataset from Understanding the Effects of RLHF on LLM Generalisation and Diversity. The dataset is in the alpaca_eval format.

1 PAPER • NO BENCHMARKS YET

SurgeGlobal/Evol-Instruct

Dataset Generation

1 PAPER • NO BENCHMARKS YET

SurgeGlobal/LaMini

Overview The LaMini Dataset is an instruction dataset generated using h2ogpt-gm-oasst1-en-2048-falcon-40b-v2. It is designed for instruction-tuning pre-trained models to specialize them in a variety of downstream tasks.

1 PAPER • NO BENCHMARKS YET

SurgeGlobal/Orca

Dataset Generation

1 PAPER • NO BENCHMARKS YET

Tamil Alpaca

Dataset Card for "tamil-alpaca"

1 PAPER • NO BENCHMARKS YET

Tamil Alpaca Orca

Dataset Card for "tamil-alpaca" This repository includes a Tamil-translated versions of the Alpaca dataset and a subset of OpenOrca dataset.

1 PAPER • NO BENCHMARKS YET

Datasets

12 dataset results for Instruction Following AND Texts