The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.


Revitalize Region Feature for Democratizing Video-Language Pre-training of Retrieval

showlab/demovlp 15 Mar 2022

Recent dominant methods for video-language pre-training (VLP) learn transferable representations from the raw pixels in an end-to-end manner to achieve advanced performance on downstream video-language retrieval.

15 Mar 2022

Disentangled Representation Learning for Text-Video Retrieval

towhee-io/towhee 14 Mar 2022

Cross-modality interaction is a critical component in Text-Video Retrieval (TVR), yet there has been little examination of how different influencing factors for computing interaction affect performance.

14 Mar 2022

All in One: Exploring Unified Video-Language Pre-training

showlab/all-in-one CVPR 2023

In this work, we for the first time introduce an end-to-end video-language model, namely \textit{all-in-one Transformer}, that embeds raw video and textual signals into joint representations using a unified backbone architecture.

14 Mar 2022

Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data

shuyanzhou/wikihow_hierarchy ACL 2022

To this end, we develop a simple and efficient method that links steps (e. g., "purchase a camera") in an article to other articles with similar goals (e. g., "how to choose a camera"), recursively constructing the KB.

14 Mar 2022

Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval

gimpong/WWW22-HCQ 7 Feb 2022

By performing Asymmetric-Quantized Contrastive Learning (AQ-CL) across views, HCQ aligns texts and videos at coarse-grained and multiple fine-grained levels.

07 Feb 2022

Reading-strategy Inspired Visual Representation Learning for Text-to-Video Retrieval

lijiabei-7/rivrl 23 Jan 2022

In this work, we concentrate on video representation learning, an essential component for text-to-video retrieval.

23 Jan 2022

Self-supervised Video Representation Learning with Cascade Positive Retrieval

necla-ml/cpr 20 Jan 2022

Implementation-wise, CPR is complementary to pretext tasks and can be easily applied to previous work.

20 Jan 2022

Bridging Video-text Retrieval with Multiple Choice Questions

towhee-io/towhee CVPR 2022

As an additional benefit, our method achieves competitive results with much shorter pre-training videos on single-modality downstream tasks, e. g., action recognition with linear evaluation.

13 Jan 2022

Multi-Query Video Retrieval

princetonvisualai/mqvr 10 Jan 2022

Retrieving target videos based on text descriptions is a task of great practical value and has received increasing attention over the past few years.

10 Jan 2022

Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval

ninatu/everything_at_once CVPR 2022

In this work, we present a multi-modal, modality agnostic fusion transformer that learns to exchange information between multiple modalities, such as video, audio, and text, and integrate them into a fused representation in a joined multi-modal embedding space.

01 Jan 2022