8 papers with code • 3 benchmarks • 3 datasets
Scene graphs are a compact and explicit representation successfully used in a variety of 2D scene understanding tasks.
Panoptic Scene Graph (PSG) generation aims to generate scene graph representations based on panoptic segmentation instead of rigid bounding boxes.
In this work, we propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.
To quantify how much LOGIN is aware of relational direction, a new diagnostic task called Bidirectional Relationship Classification (BRC) is also proposed.
To this end, we propose a new classification-then-grounding framework for VidSGG, which can avoid all the three overlooked drawbacks.
Scene graph generation is a sophisticated task because there is no specific recognition pattern (e. g., "looking at" and "near" have no conspicuous difference concerning vision, whereas "near" could occur between entities with different morphology).
The performance of current Scene Graph Generation models is severely hampered by some hard-to-distinguish predicates, e. g., "woman-on/standing on/walking on-beach" or "woman-near/looking at/in front of-child".