Abnormal Event Detection In Video
12 papers with code • 2 benchmarks • 3 datasets
Abnormal Event Detection In Video is a challenging task in computer vision, as the definition of what an abnormal event looks like depends very much on the context. For instance, a car driving by on the street is regarded as a normal event, but if the car enters a pedestrian area, this is regarded as an abnormal event. A person running on a sports court (normal event) versus running outside from a bank (abnormal event) is another example. Although what is considered abnormal depends on the context, we can generally agree that abnormal events should be unexpected events that occur less often than familiar (normal) events
Image: Ravanbakhsh et al
To avoid annotating the anomalous segments or clips in training videos, which is very time consuming, we propose to learn anomaly through the deep multiple instance ranking framework by leveraging weakly labeled training videos, i. e. the training labels (anomalous or normal) are at video-level instead of clip-level.
Surprisingly, we find that this simple representation is sufficient to achieve state-of-the-art performance in ShanghaiTech, the largest and most complex VAD dataset.
Following the standard formulation of abnormal event detection as outlier detection, we propose a background-agnostic framework that learns from training videos containing only normal events.
Next, features are extracted from each frame using a convolutional neural network (CNN) that is trained to classify between normal and abnormal frames.
Most existing approaches formulate abnormal event detection as an outlier detection task, due to the scarcity of anomalous data during training.
The main objective is to provide several solutions to the mentioned problems, by focusing on analyzing previous state-of-the-art methods and presenting an extensive overview to clarify the concepts employed on capturing normal and abnormal patterns.
To the best of our knowledge, we are the first to approach anomalous event detection in video as a multi-task learning problem, integrating multiple self-supervised and knowledge distillation proxy tasks in a single architecture.