Zero-Shot Video Retrieval

6 papers with code • 7 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval

m-bain/frozen-in-time ICCV 2021

Our objective in this work is video-text retrieval - in particular a joint embedding that enables efficient text-to-video retrieval.

Revealing Single Frame Bias for Video-and-Language Learning

jayleicn/singularity 7 Jun 2022

Training an effective video-and-language model intuitively requires multiple frames as model inputs.

Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

ninatu/everything_at_once 8 Dec 2021

Multi-modal learning from video data has seen increased attention recently as it allows to train semantically meaningful embeddings without human annotation enabling tasks like zero-shot retrieval and classification.

Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval

ninatu/everything_at_once CVPR 2022

In this work, we present a multi-modal, modality agnostic fusion transformer that learns to exchange information between multiple modalities, such as video, audio, and text, and integrate them into a fused representation in a joined multi-modal embedding space.

Clover: Towards A Unified Video-Language Alignment and Fusion Model

leeyn-43/clover 16 Jul 2022

We then introduce \textbf{Clover}\textemdash a Correlated Video-Language pre-training method\textemdash towards a universal Video-Language model for solving multiple video understanding tasks with neither performance nor efficiency compromise.

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

opengvlab/internvideo 6 Dec 2022

Specifically, InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives, and selectively coordinates video representations of these two complementary frameworks in a learnable manner to boost various video applications.