Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection

15 Feb 2019Nikolaos GkanatsiosVassilis PitsikalisPetros KoutrasAthanasia ZlatintsiPetros Maragos

Detecting visual relationships, i.e. <Subject, Predicate, Object> triplets, is a challenging Scene Understanding task approached in the past via linguistic priors or spatial information in a single feature branch. We introduce a new deeply supervised two-branch architecture, the Multimodal Attentional Translation Embeddings, where the visual features of each branch are driven by a multimodal attentional mechanism that exploits spatio-linguistic similarities in a low-dimensional space... (read more)

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.