The Shanghaitech dataset is a large-scale crowd counting dataset. It consists of 1198 annotated crowd images. The dataset is divided into two parts, Part-A containing 482 images and Part-B containing 716 images. Part-A is split into train and test subsets consisting of 300 and 182 images, respectively. Part-B is split into train and test subsets consisting of 400 and 316 images. Each person in a crowd image is annotated with one point close to the center of the head. In total, the dataset consists of 330,165 annotated people. Images from Part-A were collected from the Internet, while images from Part-B were collected on the busy streets of Shanghai.
274 PAPERS • 7 BENCHMARKS
The ShanghaiTech Campus dataset has 13 scenes with complex light conditions and camera angles. It contains 130 abnormal events and over 270, 000 training frames. Moreover, both the frame-level and pixel-level ground truth of abnormal events are annotated in this dataset.
198 PAPERS • 8 BENCHMARKS
The UCF-Crime dataset is a large-scale dataset of 128 hours of videos. It consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies including Abuse, Arrest, Arson, Assault, Road Accident, Burglary, Explosion, Fighting, Robbery, Shooting, Stealing, Shoplifting, and Vandalism. These anomalies are selected because they have a significant impact on public safety.
134 PAPERS • 3 BENCHMARKS
XD-Violence is a large-scale audio-visual dataset for violence detection in videos.
55 PAPERS • 1 BENCHMARK
UBI-Fights - Concerning a specific anomaly detection and still providing a wide diversity in fighting scenarios, the UBI-Fights dataset is a unique new large-scale dataset of 80 hours of video fully annotated at the frame level. Consisting of 1000 videos, where 216 videos contain a fight event, and 784 are normal daily life situations. All unnecessary video segments (e.g., video introductions, news, etc.) that could disturb the learning process were removed.
7 PAPERS • 2 BENCHMARKS
Vision-based Fallen Person (VFP290K) is a novel, large-scale dataset for the detection of fallen persons composed of fallen person images collected in various real-world scenarios. VFP290K consists of 294,714 frames of fallen persons extracted from 178 videos, including 131 scenes in 49 locations.
2 PAPERS • 1 BENCHMARK
This dataset focuses only on the robbery category, presenting a new weakly labelled dataset that contains 486 new real–world robbery surveillance videos acquired from public sources.
0 PAPER • NO BENCHMARKS YET