Ad-hoc video search
7 papers with code • 5 benchmarks • 8 datasets
The Ad-hoc search task ended a 3 year cycle from 2016-2018 with a goal to model the end user search use-case, who is searching (using textual sentence queries) for segments of video containing persons, objects, activities, locations, etc. and combinations of the former. While the Internet Archive (IACC.3) dataset was adopted between 2016 to 2018, starting in 2019 a new data collection based on Vimeo Creative Commons (V3C) will be adopted to support the task for at least 3 more years.
Given the test collection (V3C1 or IACC.3), master shot boundary reference, and set of Ad-hoc queries (approx. 30 queries) released by NIST, return for each query a list of at most 1000 shot IDs from the test collection ranked according to their likelihood of containing the target query.
Datasets
Latest papers
Interpretable Embedding for Ad-hoc Video Search
Answering query with semantic concepts has long been the mainstream approach for video search.
(Un)likelihood Training for Interpretable Embedding
Cross-modal representation learning has become a new normal for bridging the semantic gap between text and visual data.
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval
In this paper we revisit feature fusion, an old-fashioned topic, in the new context of text-to-video retrieval.
SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries
Inspired by the initial success of previously few works in combining multiple sentence encoders, this paper takes a step forward by developing a new and general method for effectively exploiting diverse sentence encoders.
Dual Encoding for Video Retrieval by Text
In this paper we achieve this by proposing a dual deep encoding network that encodes videos and queries into powerful dense representations of their own.
W2VV++: Fully Deep Learning for Ad-hoc Video Search
The backbone of our method is the proposed W2VV++ model, a super version of Word2VisualVec (W2VV) previously developed for visual-to-text matching.
Dual Encoding for Zero-Example Video Retrieval
This paper attacks the challenging problem of zero-example video retrieval.