Neptune is a dataset consisting of challenging question-answer-decoy (QAD) sets for long videos (up to 15 minutes). The goal of this dataset is to test video-language models for a broad range of long video reasoning abilities, which are provided as "question type" labels for each question, for example "video summarization", "temporal ordering", "state changes" and "creator intent" amongst others.
Paper | Code | Results | Date | Stars |
---|