Cross-View Cross-Scene Multi-View Crowd Counting Dataset

A large synthetic multi-camera crowd counting dataset with a large number of scenes and camera views to capture many possible variations, which avoids the difficulty of collecting and annotating such a large real dataset.

The dataset is generated using GCC-CL [50], which works as a plug-in for the game “Grand Theft Auto V”. The generating process consists of two parts: scene simulation and multiview recording. First, crowd scenes are simulated, through the selection of the background selected, region of interest (ROI), weather condition, human models and postures, etc. Next, cameras are placed at various locations to record the crowd scene from various perspectives. Birds-eye views are also collected for visualization. Each person has a specific ID for mapping their coordinates in the world coordinate system and their locations in each camera-view image. The camera parameters, such as coordinates, deflection angles and fields-of-view, are also recorded.

In total, the whole synthetic MV counting dataset contains 31 scenes. For each scene, around 100 camera views re set for multi-view recording. The multi-view recording is performed 100 times with different crowd distributions in the scene, i.e., each scene contains 100 multi-view frames, with each frame comprising 60 to 120 camera-views. The image resolution is 1920×1080.


