TV show Caption is a large-scale multimodal captioning dataset, containing 261,490 caption descriptions paired with 108,965 short video moments. TVC is unique as its captions may also describe dialogues/subtitles while the captions in the other datasets are only describing the visual content.
Source: https://tvr.cs.unc.edu/tvc.htmlPaper | Code | Results | Date | Stars |
---|