An open-ended VideoQA benchmark that aims to: i) provide a well-defined evaluation by including five correct answer annotations per question and ii) avoid questions which can be answered without the video.
iVQA contains 10,000 video clips with one question and five corresponding answers per clip. Moreover, we manually reduce the language bias by excluding questions that could be answered without watching the video.
Source: Just Ask: Learning to Answer Questions from Millions of Narrated VideosPaper | Code | Results | Date | Stars |
---|