Ad-hoc video search

7 papers with code • 5 benchmarks • 8 datasets

The Ad-hoc search task ended a 3 year cycle from 2016-2018 with a goal to model the end user search use-case, who is searching (using textual sentence queries) for segments of video containing persons, objects, activities, locations, etc. and combinations of the former. While the Internet Archive (IACC.3) dataset was adopted between 2016 to 2018, starting in 2019 a new data collection based on Vimeo Creative Commons (V3C) will be adopted to support the task for at least 3 more years.

Given the test collection (V3C1 or IACC.3), master shot boundary reference, and set of Ad-hoc queries (approx. 30 queries) released by NIST, return for each query a list of at most 1000 shot IDs from the test collection ranked according to their likelihood of containing the target query.

Benchmarks

Add a Result

These leaderboards are used to track progress in Ad-hoc video search

Dataset	Best Model	Compare
TRECVID-AVS16 (IACC.3)	LAFF	See all
TRECVID-AVS17 (IACC.3)	LAFF	See all
TRECVID-AVS18 (IACC.3)	LAFF	See all
TRECVID-AVS19 (V3C1)	LAFF	See all
TRECVID-AVS20 (V3C1)	LAFF	See all

Datasets

Most implemented papers

Most implemented Social Latest No code

Dual Encoding for Zero-Example Video Retrieval

danieljf24/dual_encoding • • CVPR 2019

This paper attacks the challenging problem of zero-example video retrieval.

Paper
Code

W2VV++: Fully Deep Learning for Ad-hoc Video Search

li-xirong/w2vvpp • • ACM Multimedia 2019 2019

The backbone of our method is the proposed W2VV++ model, a super version of Word2VisualVec (W2VV) previously developed for visual-to-text matching.

Paper
Code

Dual Encoding for Video Retrieval by Text

danieljf24/hybrid_space • • 10 Sep 2020

In this paper we achieve this by proposing a dual deep encoding network that encodes videos and queries into powerful dense representations of their own.

Paper
Code

SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries

li-xirong/sea • • 24 Nov 2020

Inspired by the initial success of previously few works in combining multiple sentence encoders, this paper takes a step forward by developing a new and general method for effectively exploiting diverse sentence encoders.

Paper
Code

Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval

ruc-aimc-lab/laff • • 3 Dec 2021

In this paper we revisit feature fusion, an old-fashioned topic, in the new context of text-to-video retrieval.

Paper
Code

(Un)likelihood Training for Interpretable Embedding

nikkiwoo-gh/ITV • • 1 Jul 2022

Cross-modal representation learning has become a new normal for bridging the semantic gap between text and visual data.

Paper
Code

Interpretable Embedding for Ad-hoc Video Search

nikkiwoo-gh/Dual-task-video-retrieval • • 19 Feb 2024

Answering query with semantic concepts has long been the mainstream approach for video search.

Paper
Code

Ad-hoc video search

Benchmarks Add a Result

Datasets

Most implemented papers

Dual Encoding for Zero-Example Video Retrieval

W2VV++: Fully Deep Learning for Ad-hoc Video Search

Dual Encoding for Video Retrieval by Text

SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries

Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval

(Un)likelihood Training for Interpretable Embedding

Interpretable Embedding for Ad-hoc Video Search

Content

Benchmarks

Add a Result