The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.
|Trend||Dataset||Best Method||Paper title||Paper||Code||Compare|
In this paper, we introduce a network architecture that takes long-term content into account and enables fast per-video processing at the same time.
#11 best model for Action Recognition In Videos on Something-Something V1 (using extra training data)
We evaluate our method on the task of video retrieval and report results for the MPII Movie Description and MSR-VTT datasets.
#2 best model for Video Retrieval on LSMDC (using extra training data)
We present an approach named JSFusion (Joint Sequence Fusion) that can measure semantic similarity between any pairs of multimodal sequence data (e. g. a video clip and a language sentence).
#3 best model for Video Retrieval on LSMDC
Our goal is to condense the multi-modal, extremely high dimensional information from videos into a single, compact video representation for the task of video retrieval using free-form text queries, where the degree of specificity is open-ended.
SOTA for Video Retrieval on LSMDC
The target of central similarity learning is to encourage hash codes for similar data pairs to be close to a common center and those for dissimilar pairs to converge to different centers in the Hamming space, which substantially improves retrieval accuracy.
Constructing a joint representation invariant across different modalities (e. g., video, language) is of significant importance in many multimedia applications.
#2 best model for Video Retrieval on MSR-VTT