FOSNet: An End-to-End Trainable Deep Neural Network for Scene Recognition

17 Jul 2019  ·  Hongje Seong, Junhyuk Hyun, Euntai Kim ·

Scene recognition is an image recognition problem aimed at predicting the category of the place at which the image is taken. In this paper, a new scene recognition method using the convolutional neural network (CNN) is proposed. The proposed method is based on the fusion of the object and the scene information in the given image and the CNN framework is named as FOS (fusion of object and scene) Net. In addition, a new loss named scene coherence loss (SCL) is developed to train the FOSNet and to improve the scene recognition performance. The proposed SCL is based on the unique traits of the scene that the 'sceneness' spreads and the scene class does not change all over the image. The proposed FOSNet was experimented with three most popular scene recognition datasets, and their state-of-the-art performance is obtained in two sets: 60.14% on Places 2 and 90.37% on MIT indoor 67. The second highest performance of 77.28% is obtained on SUN 397.

PDF Abstract
No code implementations yet. Submit your code now
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Scene Recognition MIT Indoor Scenes FOSNet Accuracy 90.3 # 1
Scene Recognition Places365 FOSNet Top 1 Accuracy 60.14 # 1
Top 5 Accuracy 88.86 # 1
Scene Recognition SUN397 FOSNet Accuracy 77.28 # 1

Methods


No methods listed for this paper. Add relevant methods here