SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences

Scene graphs are a compact and explicit representation successfully used in a variety of 2D scene understanding tasks. This work proposes a method to incrementally build up semantic scene graphs from a 3D environment given a sequence of RGB-D frames. To this end, we aggregate PointNet features from primitive scene components by means of a graph neural network. We also propose a novel attention mechanism well suited for partial and missing graph data present in such an incremental reconstruction scenario. Although our proposed method is designed to run on submaps of the scene, we show it also transfers to entire 3D scenes. Experiments show that our approach outperforms 3D scene graph prediction methods by a large margin and its accuracy is on par with other 3D semantic and panoptic segmentation methods while running at 35 Hz.

PDF Abstract CVPR 2021 PDF CVPR 2021 Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Scene Graph Generation 3R-Scan 3DSSG [Wald2020_3dssg] Top-5 Accuracy 0.66 # 2
Predicate Classification 3R-Scan SceneGraphFusion Top-3 Accuracy 0.97 # 1
Top-5 Accuracy 0.99 # 1
Scene Graph Generation 3R-Scan SceneGraphFusion Top-5 Accuracy 0.87 # 1
3D Object Classification 3R-Scan SceneGraphFusion Top-5 Accuracy 0.7 # 1
Top-10 Accuracy 0.8 # 1
Predicate Classification 3R-Scan 3DSSG [Wald2020_3dssg] Top-3 Accuracy 0.89 # 2
Top-5 Accuracy 0.93 # 2
3D Object Classification 3R-Scan 3DSSG [Wald2020_3dssg] Top-5 Accuracy 0.68 # 2
Top-10 Accuracy 0.78 # 2
Panoptic Segmentation ScanNetV2 SceneGraphFusion (NN mapping) PQ 31.5 # 2
SQ 72.9 # 2
RQ 42.2 # 2