HourVideo

Introduced by Chandrasegaran et al. in HourVideo: 1-Hour Video-Language Understanding

We introduce HourVideo, a benchmark dataset for hour-long video-language understanding. HourVideo consists of a novel task suite comprising summarization, perception (recall, tracking), visual reasoning (spatial, temporal, predictive, causal, counterfactual), and navigation (room-to-room, object retrieval) tasks. HourVideo includes 500 manually curated egocentric videos from the Ego4D dataset, spanning durations of 20 to 120 minutes, and features 12,976 high-quality, five-way multiple-choice questions. We hope to establish HourVideo as a benchmark challenge to spur the development of advanced multimodal models capable of truly understanding endless streams of visual data.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


Modalities


Languages