Scene Graph Generation
130 papers with code • 5 benchmarks • 7 datasets
A scene graph is a structured representation of an image, where nodes in a scene graph correspond to object bounding boxes with their object categories, and edges correspond to their pairwise relationships between objects. The task of Scene Graph Generation is to generate a visually-grounded scene graph that most accurately correlates with an image.
Libraries
Use these libraries to find Scene Graph Generation models and implementationsMost implemented papers
Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation
We show that such models can suffer the most in their ability to generalize to rare compositions, evaluating two different models on the Visual Genome dataset and its more recent, improved version, GQA.
Generative Compositional Augmentations for Scene Graph Prediction
However, test images might contain zero- and few-shot compositions of objects and relationships, e. g. <cup, on, surfboard>.
Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation
Scene graph aims to faithfully reveal humans' perception of image content.
PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation
Today, scene graph generation(SGG) task is largely limited in realistic scenarios, mainly due to the extremely long-tailed bias of predicate annotation distribution.
CogTree: Cognition Tree Loss for Unbiased Scene Graph Generation
We first build a cognitive structure CogTree to organize the relationships based on the prediction of a biased SGG model.
Are scene graphs good enough to improve Image Captioning?
Overall, we find no significant difference between models that use scene graph features and models that only use object detection features across different captioning metrics, which suggests that existing scene graph generation models are still too noisy to be useful in image captioning.
Dense Relational Image Captioning via Multi-task Triple-Stream Networks
To this end, we propose the multi-task triple-stream network (MTTSNet) which consists of three recurrent units responsible for each POS which is trained by jointly predicting the correct captions and POS for each word.
After All, Only The Last Neuron Matters: Comparing Multi-modal Fusion Functions for Scene Graph Generation
From object segmentation to word vector representations, Scene Graph Generation (SGG) became a complex task built upon numerous research results.
Context-Aware Scene Graph Generation With Seq2Seq Transformers
In this task, the model needs to detect objects and predict visual relationships between them.
Energy-Based Learning for Scene Graph Generation
The proposed formulation allows for efficiently incorporating the structure of scene graphs in the output space.