Learning spatiotemporal representations for human fall detection in surveillance video

In this paper, a computer vision based framework is proposed that detects falls from surveillance videos. Firstly, we employ background subtraction and rank pooling to model spatial and temporal representations in videos, respectively. We then introduce a novel three-stream Convolutional Neural Networks as an event classifier. Silhouettes and their motion history images serve as input to the first two streams, while dynamic images whose temporal duration is equal to motion history images, are used as input to the third stream. Finally, we apply voting on the results of event classification to perform multicamera fall detection. The main novelty of our method against the conventional ones is that highquality spatiotemporal representations in different levels are learned to take full advantage of the appearance and motion information. Extensive experiments have been conducted on two widely used fall datasets. The results have shown to demonstrate the effectiveness of the proposed method.

PDF Abstract
No code implementations yet. Submit your code now


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here