no code implementations • 10 Nov 2021 • Zijian Gao, Jingyu Liu, Weiqi Sun, Sheng Chen, Dedan Chang, Lili Zhao
Modern video-text retrieval frameworks basically consist of three parts: video encoder, text encoder and the similarity head.
Ranked #9 on
Video Retrieval
on MSR-VTT-1kA
(using extra training data)