This dataset evaluates instruction following ability of large language models. There are 500+ prompts with instructions such as "write an article with more than 800 words", "wrap your response with double quotation marks", etc.
11 PAPERS • 1 BENCHMARK
MultI-Modal In-Context Instruction Tuning (MIMIC-IT) is a dataset for instruction tuning into multi-modal models, motivated by the Flamingo model's upstream interleaved format pretraining dataset. The data sample consists of a queried image-instruction-answer triplet, with the instruction-answer tailored to the image, and context. The context contains a series of image-instruction-answer triplets that contextually correlate with the queried triplet, emulating the relationship between the context and the queried image-text pair found in the MMC4 dataset.
9 PAPERS • NO BENCHMARKS YET
UGIF is a multi-lingual, multi-modal UI grounded dataset for step-by-step task completion on the smartphone. It contains 523 natural language instructions with paired sequences of multilingual UI screens and actions that show how to execute the task in eight languages.
7 PAPERS • NO BENCHMARKS YET
Bactrian-X is a comprehensive multilingual parallel dataset of 3.4 million instruction-response pairs across 52 languages. The instructions were obtained from alpaca-52k, and dolly-15k, and tranlated into 52 languages (52 languages x 67k instances = 3.4M instances).
4 PAPERS • NO BENCHMARKS YET
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
1 PAPER • NO BENCHMARKS YET
InstructOpenWiki is a substantial instruction tuning dataset for Open-world IE enriched with a comprehensive corpus, extensive annotations, and diverse instructions.
This is the sequential instructions dataset from Understanding the Effects of RLHF on LLM Generalisation and Diversity. The dataset is in the alpaca_eval format.
Dataset Generation
Overview The LaMini Dataset is an instruction dataset generated using h2ogpt-gm-oasst1-en-2048-falcon-40b-v2. It is designed for instruction-tuning pre-trained models to specialize them in a variety of downstream tasks.
Dataset Card for "tamil-alpaca"
Dataset Card for "tamil-alpaca" This repository includes a Tamil-translated versions of the Alpaca dataset and a subset of OpenOrca dataset.