NExT-QA is a VideoQA benchmark targeting the explanation of video contents. It challenges QA models to reason about the causal and temporal actions and understand the rich object interactions in daily activities. It supports both multi-choice and open-ended QA tasks. The videos are untrimmed and the questions usually invoke local video contents for answers.
85 PAPERS • 5 BENCHMARKS