|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
Object detection, scene graph generation and region captioning, which are three scene understanding tasks at different semantic levels, are tied together: scene graphs are generated on top of objects detected in an image with their pairwise relationship predicted, while region captioning gives a language description of the objects, their attributes, relations, and other context information.
SOTA for Object Detection on Visual Genome
More specifically, we show that the statistical correlations between objects appearing in images and their relationships, can be explicitly represented by a structured knowledge graph, and a routing mechanism is learned to propagate messages through the graph to explore their interactions.
In this work, we explicitly model the objects and their relationships using scene graphs, a visually-grounded graphical structure of an image.
Recent approaches on visual scene understanding attempt to build a scene graph -- a computational representation of objects and their pairwise relationships.