Video to Text Retrieval

7 papers with code • 2 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Video to Text Retrieval

Trend	Dataset	Best Model	Paper	Code	Compare
	MSVD-Indonesian	X-CLIP (Cross-Lingual)			See all
	Sakuga-42M	VideoMamba			See all

Datasets

Most implemented papers

Most implemented Social Latest No code

Learning a Text-Video Embedding from Incomplete and Heterogeneous Data

antoine77340/Mixture-of-Embedding-Experts • • 7 Apr 2018

We evaluate our method on the task of video retrieval and report results for the MPII Movie Description and MSR-VTT datasets.

Paper
Code

Bridging Video-text Retrieval with Multiple Choice Questions

tencentarc/mcq • • CVPR 2022

As an additional benefit, our method achieves competitive results with much shorter pre-training videos on single-modality downstream tasks, e. g., action recognition with linear evaluation.

Paper
Code

CLIP2Video: Mastering Video-Text Retrieval via Image CLIP

CryhanFang/CLIP2Video • • 21 Jun 2021

We present CLIP2Video network to transfer the image-language pre-training model to video-text retrieval in an end-to-end manner.

Paper
Code

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

google-research/google-research • • 1 Apr 2022

Large pretrained (e. g., "foundation") models exhibit distinct capabilities depending on the domain of data they are trained on.

Paper
Code

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval

tencentarc/mcq • • 26 Apr 2022

Dominant pre-training work for video-text retrieval mainly adopt the "dual-encoder" architectures to enable efficient retrieval, where two separate encoders are used to contrast global video and text representations, but ignore detailed local semantics.

Paper
Code

MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian

willyfh/msvd-indonesian • 20 Jun 2023

Since the availability of the pretraining resources with Indonesian sentences is relatively limited, the applicability of those approaches to our dataset is still questionable.

Paper
Code

Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval

leolee99/pau • • NeurIPS 2023

In this paper, we propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.

Paper
Code

Video to Text Retrieval

Benchmarks Add a Result

Datasets

Most implemented papers

Learning a Text-Video Embedding from Incomplete and Heterogeneous Data

Bridging Video-text Retrieval with Multiple Choice Questions

CLIP2Video: Mastering Video-Text Retrieval via Image CLIP

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval

MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian

Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval

Content

Benchmarks

Add a Result