The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech.
This paper proposes an end-to-end deep hashing framework with category mask for fast video retrieval.
In this paper, we introduce a network architecture that takes long-term content into account and enables fast per-video processing at the same time.
#12 best model for Action Recognition In Videos on Something-Something V1 (using extra training data)
We evaluate our method on the task of video retrieval and report results for the MPII Movie Description and MSR-VTT datasets.
#2 best model for Video Retrieval on LSMDC (using extra training data)
We present an approach named JSFusion (Joint Sequence Fusion) that can measure semantic similarity between any pairs of multimodal sequence data (e. g. a video clip and a language sentence).
#3 best model for Video Retrieval on LSMDC
Our goal is to condense the multi-modal, extremely high dimensional information from videos into a single, compact video representation for the task of video retrieval using free-form text queries, where the degree of specificity is open-ended.
SOTA for Video Retrieval on LSMDC
The target of central similarity learning is to encourage hash codes for similar data pairs to be close to a common center and those for dissimilar pairs to converge to different centers in the Hamming space, which substantially improves retrieval accuracy.
Constructing a joint representation invariant across different modalities (e. g., video, language) is of significant importance in many multimedia applications.
#2 best model for Video Retrieval on MSR-VTT
We target at a challenging problem of training Deep Neural Networks (DNNs) on abnormal training data, where a considerable proportion of observations and their labels are semantically unmatched, e. g., corrupted labels, or out-of-distribution training examples etc.