Visual relationship detection with deep structural ranking

27 Apr 2018  ·  Kongming Liang, Yuhong Guo, Hong Chang, Xilin Chen ·

Visual relationship detection aims to describe the interactions between pairs of objects. Different from individual object learning tasks, the number of possible relationships is much larger, which makes it hard to explore only based on the visual appearance of objects. In addition, due to the limited human effort, the annotations for visual relationships are usually incomplete which increases the difficulty of model training and evaluation. In this paper, we propose a novel framework, called Deep Structural Ranking, for visual relationship detection. To complement the representation ability of visual appearance, we integrate multiple cues for predicting the relationships contained in an input image. Moreover, we design a new ranking objective function by enforcing the annotated relationships to have higher relevance scores. Unlike previous works, our proposed method can both facilitate the co-occurrence of relationships and mitigate the incompleteness problem. Experimental results show that our proposed method outperforms the state-of-the-art on the two widely used datasets. We also demonstrate its superiority in detecting zero-shot relationships.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Visual Relationship Detection VRD Predicate Detection vrd-dsr R@100 93.18 # 2
R@50 86.01 # 2
Visual Relationship Detection VRD Relationship Detection vrd-dsr R@100 23.29 # 2
R@50 19.03 # 3