Learning Temporal Regularity in Video Sequences

Perceiving meaningful activities in a long video sequence is a challenging problem due to ambiguous definition of 'meaningfulness' as well as clutters in the scene. We approach this problem by learning a generative model for regular motion patterns, termed as regularity, using multiple sources with very limited supervision. Specifically, we propose two methods that are built upon the autoencoders for their ability to work with little to no supervision. We first leverage the conventional handcrafted spatio-temporal local features and learn a fully connected autoencoder on them. Second, we build a fully convolutional feed-forward autoencoder to learn both the local features and the classifiers as an end-to-end learning framework. Our model can capture the regularities from multiple datasets. We evaluate our methods in both qualitative and quantitative ways - showing the learned regularity of videos in various aspects and demonstrating competitive performance on anomaly detection datasets as an application.

PDF Abstract CVPR 2016 PDF CVPR 2016 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Abnormal Event Detection In Video UBI-Fights Hasan et al. AUC 0.528 # 6
Decidability 0.194 # 4
EER 0.466 # 4
Semi-supervised Anomaly Detection UBI-Fights Hasan et al. AUC 0.528 # 7
Decidability 0.194 # 4
EER 0.466 # 4

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Source Paper Compare
Traffic Accident Detection A3D Conv-AE AUC 49.5 # 2
Traffic Accident Detection SA Conv-AE AUC 50.4 # 2
Video Anomaly Detection HR-Avenue Conv-AE AUC 84.8 # 9
Video Anomaly Detection HR-ShanghaiTech Conv-AE AUC 69.8 # 11

Methods