Video Retrieval

221 papers with code • 18 benchmarks • 31 datasets

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Retrieval

Dataset	Best Model	Compare
MSR-VTT-1kA	HunYuan_tvr (huge)	See all
LSMDC	InternVideo2-6B	See all
MSR-VTT	VAST	See all
DiDeMo	InternVideo2-6B	See all
ActivityNet	InternVideo2-6B	See all
MSVD	InternVideo2-6B	See all
FIVR-200K	S2VS	See all
YouCook2	VAST	See all
VATEX	VAST	See all
QuerYD	QB-Norm+TT-CE+	See all
SSv2-label retrieval	UMT-L (ViT-L/16)	See all
SSv2-template retrieval	UMT-L (ViT-L/16)	See all
Condensed Movies	TESTA (ViT-B/16)	See all
TVR	Hero w/ pre-training	See all
TGIF	MDMMT-2	See all
RUDDER	PO Loss	See all
Charades-STA	PO Loss	See all
MSVD-Indonesian	X-CLIP (Cross-Lingual)	See all

Show all 18 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Video Retrieval models and implementations

towhee-io/towhee

5 papers

3,009

jpthu17/diffusionret

4 papers

albanie/collaborative-experts

3 papers

328

pytorch/fairseq

2 papers

29,334

See all 5 libraries.

Datasets

Subtasks

Replay Grounding

Composed Video Retrieval (CoVR)

Latest papers with no code

Most implemented Social Latest No code

Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models

no code yet • 29 Apr 2024

Our contributions encompass the development of an innovative interactive image retrieval system, the integration of an LLM-based denoiser, the curation of a meticulously designed evaluation dataset, and thorough experimental validation.

Paper
Add Code

Learning text-to-video retrieval from image captioning

no code yet • 26 Apr 2024

In this paper, we make use of this progress and instantiate the image experts from two types of models: a text-to-image retrieval model to provide an initial backbone, and image captioning models to provide supervision signal into unlabeled videos.

Paper
Add Code

SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval

no code yet • 22 Apr 2024

In particular, text-video retrieval, which aims to find the top matching videos given text descriptions from a vast video corpus, is an essential function, the primary challenge of which is to bridge the modality gap.

Paper
Add Code

ProTA: Probabilistic Token Aggregation for Text-Video Retrieval

no code yet • 18 Apr 2024

Text-video retrieval aims to find the most relevant cross-modal samples for a given query.

Paper
Add Code

Event-aware Video Corpus Moment Retrieval

no code yet • 21 Feb 2024

Video Corpus Moment Retrieval (VCMR) is a practical video retrieval task focused on identifying a specific moment within a vast corpus of untrimmed videos using the natural language query.

Paper
Add Code

Video Editing for Video Retrieval

no code yet • 4 Feb 2024

The teacher model is employed to edit the clips in the training set whereas the student model trains on the edited clips.

Paper
Add Code

CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing

no code yet • 22 Jan 2024

To bridge the gap between modalities, CoAVT employs a query encoder, which contains a set of learnable query embeddings, and extracts the most informative audiovisual features of the corresponding text.

Paper
Add Code

Distilling Vision-Language Models on Millions of Videos

no code yet • 11 Jan 2024

Our best model outperforms state-of-the-art methods on MSR-VTT zero-shot text-to-video retrieval by 6%.

Paper
Add Code

Text-Video Retrieval via Variational Multi-Modal Hypergraph Networks

no code yet • 6 Jan 2024

Compared to conventional textual retrieval, the main obstacle for text-video retrieval is the semantic gap between the textual nature of queries and the visual richness of video content.

Paper
Add Code

Detours for Navigating Instructional Videos

no code yet • 3 Jan 2024

We introduce the video detours problem for navigating instructional videos.

Paper
Add Code

Video Retrieval

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result