VMSMO

Introduced by Li et al. in VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles

The Video-based Multimodal Summarization with Multimodal Output (VMSMO) corpus consists of 184,920 document-summary pairs, with 180,000 training pairs, 2,460 validation and test pairs. The task for this dataset is generating and appropriate textual summary of an article and choosing a proper cover frame from a video accompanying the article.

Source: https://github.com/yingtaomj/VMSMO

Homepage