Learning Language-Visual Embedding for Movie Understanding with Natural-Language

26 Sep 2016Atousa TorabiNiket TandonLeonid Sigal

Learning a joint language-visual embedding has a number of very appealing properties and can result in variety of practical application, including natural language image/video annotation and search. In this work, we study three different joint language-visual neural network model architectures... (read more)

PDF Abstract

Code


No code implementations yet. Submit your code now

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT LEADERBOARD
Video Retrieval MSR-VTT C+LSTM+SA+FC7 text-to-video [email protected] 4.2 # 6
text-to-video [email protected] 19.9 # 6
text-to-video Median Rank 55 # 6
video-to-text [email protected] 12.9 # 6