HowToVQA69M

Introduced by Yang et al. in Just Ask: Learning to Answer Questions from Millions of Narrated Videos

A dataset of 69,270,581 video clip, question and answer triplets (v, q, a). HowToVQA69M is two orders of magnitude larger than any of the currently available VideoQA datasets.

On average, each original video results in 43 video clips, where each clip lasts 12.1 seconds and is associated to 1.2 question-answer pairs. Questions and answers contain 8.7 and 2.4 words on average respectively. HowToVQA69M is highly diverse and contains over 16M unique answers, where over 2M unique answers appear more than once and over 300K unique answers appear more than ten times.

Source: Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

antoyang/just-ask

113

Tasks

Similar Datasets

WebVidVQA3M

TGIF-QA

iVQA

How2QA

Usage

License

Unknown

Modalities

Videos
Texts

Languages

English