Inspired by the initial success of previously few works in combining multiple sentence encoders, this paper takes a step forward by developing a new and general method for effectively exploiting diverse sentence encoders.
Ranked #2 on Ad-hoc video search on TRECVID-AVS16 (IACC.3) (using extra training data)
In this paper we achieve this by proposing a dual deep encoding network that encodes videos and queries into powerful dense representations of their own.
Ranked #3 on Ad-hoc video search on TRECVID-AVS16 (IACC.3) (using extra training data)
Predicting the relevance between two given videos with respect to their visual content is a key component for content-based video recommendation and retrieval.
This paper contributes to cross-lingual image annotation and retrieval in terms of data and baseline methods.