Recognize Complex Events From Static Images by Fusing Deep Channels

CVPR 2015  ·  Yuanjun Xiong, Kai Zhu, Dahua Lin, Xiaoou Tang ·

A considerable portion of web images capture events that occur in our personal lives or social activities. In this paper, we aim to develop an effective method for recognizing events from such images. Despite the sheer amount of study on event recognition, most existing methods rely on videos and are not directly applicable to this task. Generally, events are complex phenomena that involve interactions among people and objects, and therefore analysis of event photos requires techniques that can go beyond recognizing individual objects and carry out joint reasoning based on evidences of multiple aspects. Inspired by the recent success of deep learning, we formulate a multi-layer framework to tackle this problem, which takes into account both visual appearance and the interactions among humans and objects, and combines them via semantic fusion. An important issue arising here is that humans and objects discovered by detectors are in the form of bounding boxes, and there is no straightforward way to represent their interactions and incorporate them with a deep network. We address this using a novel strategy that projects the detected instances onto multi-scale spatial maps. On a large dataset with $60,000$ images, the proposed method achieved substantial improvement over the state-of-the-art, raising the accuracy of event recognition by over $10\%$.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


Introduced in the Paper:

WIDER

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here