The How2 dataset contains 13,500 videos, or 300 hours of speech, and is split into 185,187 training, 2022 development (dev), and 2361 test utterances. It has subtitles in English and crowdsourced Portuguese translations.
73 PAPERS • 2 BENCHMARKS
MultiSum is a dataset for multimodal summarization (MSMO). It consists of 17 categories and 170 subcategories to encapsulate a diverse array of real-world scenarios. The dataset features:
1 PAPER • NO BENCHMARKS YET