Ad-hoc video search

7 papers with code • 5 benchmarks • 8 datasets

The Ad-hoc search task ended a 3 year cycle from 2016-2018 with a goal to model the end user search use-case, who is searching (using textual sentence queries) for segments of video containing persons, objects, activities, locations, etc. and combinations of the former. While the Internet Archive (IACC.3) dataset was adopted between 2016 to 2018, starting in 2019 a new data collection based on Vimeo Creative Commons (V3C) will be adopted to support the task for at least 3 more years.

Given the test collection (V3C1 or IACC.3), master shot boundary reference, and set of Ad-hoc queries (approx. 30 queries) released by NIST, return for each query a list of at most 1000 shot IDs from the test collection ranked according to their likelihood of containing the target query.

Interpretable Embedding for Ad-hoc Video Search

nikkiwoo-gh/Dual-task-video-retrieval 19 Feb 2024

Answering query with semantic concepts has long been the mainstream approach for video search.

0
19 Feb 2024

(Un)likelihood Training for Interpretable Embedding

nikkiwoo-gh/ITV 1 Jul 2022

Cross-modal representation learning has become a new normal for bridging the semantic gap between text and visual data.

0
01 Jul 2022

Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval

ruc-aimc-lab/laff 3 Dec 2021

In this paper we revisit feature fusion, an old-fashioned topic, in the new context of text-to-video retrieval.

39
03 Dec 2021

SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries

li-xirong/sea 24 Nov 2020

Inspired by the initial success of previously few works in combining multiple sentence encoders, this paper takes a step forward by developing a new and general method for effectively exploiting diverse sentence encoders.

6
24 Nov 2020

Dual Encoding for Video Retrieval by Text

danieljf24/hybrid_space 10 Sep 2020

In this paper we achieve this by proposing a dual deep encoding network that encodes videos and queries into powerful dense representations of their own.

87
10 Sep 2020

W2VV++: Fully Deep Learning for Ad-hoc Video Search

li-xirong/w2vvpp ACM Multimedia 2019 2019

The backbone of our method is the proposed W2VV++ model, a super version of Word2VisualVec (W2VV) previously developed for visual-to-text matching.

28
21 Oct 2019

Dual Encoding for Zero-Example Video Retrieval

danieljf24/dual_encoding CVPR 2019

This paper attacks the challenging problem of zero-example video retrieval.

155
17 Sep 2018