The Ad-hoc search task ended a 3 year cycle from 2016-2018 with a goal to model the end user search use-case, who is searching (using textual sentence queries) for segments of video containing persons, objects, activities, locations, etc. and combinations of the former. While the Internet Archive (IACC.3) dataset was adopted between 2016 to 2018, starting in 2019 a new data collection based on Vimeo Creative Commons (V3C) will be adopted to support the task for at least 3 more years.

Given the test collection (V3C1 or IACC.3), master shot boundary reference, and set of Ad-hoc queries (approx. 30 queries) released by NIST, return for each query a list of at most 1000 shot IDs from the test collection ranked according to their likelihood of containing the target query.

Most implemented papers

Dual Encoding for Zero-Example Video Retrieval

danieljf24/dual_encoding CVPR 2019

This paper attacks the challenging problem of zero-example video retrieval.

W2VV++: Fully Deep Learning for Ad-hoc Video Search

li-xirong/w2vvpp ACM Multimedia 2019 2019

The backbone of our method is the proposed W2VV++ model, a super version of Word2VisualVec (W2VV) previously developed for visual-to-text matching.

Dual Encoding for Video Retrieval by Text

danieljf24/hybrid_space 10 Sep 2020

In this paper we achieve this by proposing a dual deep encoding network that encodes videos and queries into powerful dense representations of their own.

SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries

li-xirong/sea 24 Nov 2020

Inspired by the initial success of previously few works in combining multiple sentence encoders, this paper takes a step forward by developing a new and general method for effectively exploiting diverse sentence encoders.

TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains

LADI-Dataset/ladi-overview 27 Apr 2021

In total, 29 teams from various research organizations worldwide completed one or more of the following six tasks: 1.

Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval

ruc-aimc-lab/laff 3 Dec 2021

In this paper we revisit feature fusion, an old-fashioned topic, in the new context of text-to-video retrieval.

(Un)likelihood Training for Interpretable Embedding

nikkiwoo-gh/ITV 1 Jul 2022

Cross-modal representation learning has become a new normal for bridging the semantic gap between text and visual data.

An overview on the evaluated video retrieval tasks at TRECVID 2022

LADI-Dataset/ladi-overview 22 Jun 2023

The TREC Video Retrieval Evaluation (TRECVID) is a TREC-style video analysis and retrieval evaluation with the goal of promoting progress in research and development of content-based exploitation and retrieval of information from digital video via open, tasks-based evaluation supported by metrology.

Interpretable Embedding for Ad-hoc Video Search

nikkiwoo-gh/Dual-task-video-retrieval 19 Feb 2024

Answering query with semantic concepts has long been the mainstream approach for video search.