Search Results for author: Dedan Chang

CLIP2TV: Align, Match and Distill for Video-Text Retrieval

Modern video-text retrieval frameworks basically consist of three parts: video encoder, text encoder and the similarity head.

Ranked #12 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.