A dataset of 69,270,581 video clip, question and answer triplets (v, q, a). HowToVQA69M is two orders of magnitude larger than any of the currently available VideoQA datasets.

On average, each original video results in 43 video clips, where each clip lasts 12.1 seconds and is associated to 1.2 question-answer pairs. Questions and answers contain 8.7 and 2.4 words on average respectively. HowToVQA69M is highly diverse and contains over 16M unique answers, where over 2M unique answers appear more than once and over 300K unique answers appear more than ten times.

Source: Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Papers


Paper Code Results Date Stars

Dataset Loaders


Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages