VideoInstruct (Video Instruction Dataset)

Introduced by Maaz et al. in Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

Video Instruction Dataset is used to train Video-ChatGPT. It consists of 100,000 high-quality video instruction pairs. employs a combination of human-assisted and semi-automatic annotation techniques, aiming to produce high-quality video instruction data. These methods create question-answer pairs related to

  1. Video summarization
  2. Description-based question-answers (exploring spatial, temporal, relationships, and reasoning concepts)
  3. Creative/generative question-answers

The details are available at


