Scene Graph Generation

130 papers with code • 5 benchmarks • 7 datasets

A scene graph is a structured representation of an image, where nodes in a scene graph correspond to object bounding boxes with their object categories, and edges correspond to their pairwise relationships between objects. The task of Scene Graph Generation is to generate a visually-grounded scene graph that most accurately correlates with an image.

Source: Scene Graph Generation by Iterative Message Passing

Libraries

Use these libraries to find Scene Graph Generation models and implementations

Most implemented papers

Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation

bknyaz/sgg 17 May 2020

We show that such models can suffer the most in their ability to generalize to rare compositions, evaluating two different models on the Visual Genome dataset and its more recent, improved version, GQA.

Generative Compositional Augmentations for Scene Graph Prediction

bknyaz/sgg ICCV 2021

However, test images might contain zero- and few-shot compositions of objects and relationships, e. g. <cup, on, surfboard>.

Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation

Kenneth-Wong/het-eccv20 ECCV 2020

Scene graph aims to faithfully reveal humans' perception of image content.

PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation

coldmanck/recovering-unbiased-scene-graphs 2 Sep 2020

Today, scene graph generation(SGG) task is largely limited in realistic scenarios, mainly due to the extremely long-tailed bias of predicate annotation distribution.

CogTree: Cognition Tree Loss for Unbiased Scene Graph Generation

CYVincent/Scene-Graph-Transformer-CogTree 16 Sep 2020

We first build a cognitive structure CogTree to organize the relationships based on the prediction of a biased SGG model.

Are scene graphs good enough to improve Image Captioning?

iacercalixto/butd-image-captioning Asian Chapter of the Association for Computational Linguistics 2020

Overall, we find no significant difference between models that use scene graph features and models that only use object detection features across different captioning metrics, which suggests that existing scene graph generation models are still too noisy to be useful in image captioning.

Dense Relational Image Captioning via Multi-task Triple-Stream Networks

Dong-JinKim/DenseRelationalCaptioning 8 Oct 2020

To this end, we propose the multi-task triple-stream network (MTTSNet) which consists of three recurrent units responsible for each POS which is trained by jointly predicting the correct captions and POS for each word.

After All, Only The Last Neuron Matters: Comparing Multi-modal Fusion Functions for Scene Graph Generation

Karim-53/SGG 9 Nov 2020

From object segmentation to word vector representations, Scene Graph Generation (SGG) became a complex task built upon numerous research results.

Context-Aware Scene Graph Generation With Seq2Seq Transformers

layer6ai-labs/sgg-seq2seq ICCV 2021

In this task, the model needs to detect objects and predict visual relationships between them.

Energy-Based Learning for Scene Graph Generation

mods333/energy-based-scene-graph CVPR 2021

The proposed formulation allows for efficiently incorporating the structure of scene graphs in the output space.