An Enhanced Object Detection Model for Scene Graph Generation

With computer vision improving, a higher level of understanding is needed to solve more complex problems such as semantic image retrieval, image captioning, and scene understanding. Scene understanding has been a long-studied problem due to its complexity and lack of proper data representation. A scene Graph is one of the most powerful data representations that can better understand the scene context. The task of a Scene Graph is to encode the objects presented in the scene, their attributes, as long as the relationships between these objects. With the scene Graph proving its capabilities in complicated tasks, the automation of scene graph generation became a must. Great research has been made to obtain accurate Scene Graphs using different deep learning architectures. The common module among those different architectures is the object detection module, where objects are firstly located in the input image. In this work, we propose using the most recent object detectors from the YOLOv5 family for the scene graph generation task. The proposed YOLOv5x6 achieved a State-Of-The-Art result of 32.7 mean average precision compared to previous works. Furthermore, the paper reviews the different object detectors used in literature for the scene graph generation.

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Object Detection Visual Genome YOLOv5x6 MAP 32.7 # 1

Methods


No methods listed for this paper. Add relevant methods here