VideoXum is an enriched large-scale dataset for cross-modal video summarization. The dataset is built on ActivityNet Captions. The datasets includes three subtasks: Video-to-Video Summarization (V2V-SUM), Video-to-Text Summarization (V2T-SUM), and Video-to-Video&Text Summarization (V2VT-SUM).

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages