From Node to Graph: Joint Reasoning on Visual-Semantic Relational Graph for Zero-Shot Detection

Zero-Shot Detection (ZSD), which aims at localizing andrecognizing unseen objects in a complicated scene, usuallyleverages the visual and semantic information of individ-ual objects alone. However, scene understanding of hu-man exceeds recognizing individual objects separately: thecontextual information among multiple objects such as vi-sual relational information (e.g. visually similar objects)and semantic relational information (e.g. co-occurrences)is helpful for understanding of visual scene. In this pa-per, we verify that contextual information plays a more im-portant role in ZSD than in traditional object detection.To make full use of such information, we propose a newend-to-end ZSD methodGRaphAligningNetwork (GRAN)based on graph modeling and reasoning which simultane-ously considers visual and semantic information of multipleobjects instead of individual objects. Specifically, we for-mulate a Visual Relational Graph (VRG) and a SemanticRelational Graph (SRG), where the nodes are the objectsin the image and the semantic representations of classes re-spectively and the edges are the relevance between nodesin each graph. To characterize mutual effect between twomodalities, the two graphs are further merged into a hetero-geneous Visual-Semantic Relational Graph (VSRG), wheremodal translators are designed for the two subgraphs to en-able modal information to transform into a common spacefor communication, and message passing among nodes isenforced to refine their representations. Comprehensive ex-periments on MSCOCO dataset demonstrate the advantageof our method over state-of-the-arts, and qualitative anal-ysis suggests the validity of using contextual information.

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Generalized Zero-Shot Object Detection MS-COCO GRAN HM(mAP) 20.40 # 6
HM(Recall) 62.82 # 1
Zero-Shot Object Detection MS-COCO GRAN mAP 14.90 # 6
Recall 62.70 # 3

Methods


No methods listed for this paper. Add relevant methods here