A new multimodal retrieval dataset. TVR requires systems to understand both videos and their associated subtitle (dialogue) texts, making it more realistic. The dataset contains 109K queries collected on 21.8K videos from 6 TV shows of diverse genres, where each query is associated with a tight temporal window.
Source: TVR: A Large-Scale Dataset for Video-Subtitle Moment RetrievalPaper | Code | Results | Date | Stars |
---|