Collects dense per-video-shot concept annotations.
4 PAPERS • 1 BENCHMARK
MultiSum is a dataset for multimodal summarization (MSMO). It consists of 17 categories and 170 subcategories to encapsulate a diverse array of real-world scenarios. The dataset features:
1 PAPER • NO BENCHMARKS YET
A short clip of video may contain progression of multiple events and an interesting story line. A human needs to capture both the event in every shot and associate them together to understand the story behind it.
1 PAPER • 3 BENCHMARKS