Wildtrack is a large-scale and high-resolution dataset. It has been captured with seven static cameras in a public open area, and unscripted dense groups of pedestrians standing and walking. Together with the camera frames, we provide an accurate joint (extrinsic and intrinsic) calibration, as well as 7 series of 400 annotated frames for detection at a rate of 2 frames per second. This results in over 40 000 bounding boxes delimiting every person present in the area of interest, for a total of more than 300 individuals.
57 PAPERS • 2 BENCHMARKS
MultiviewX is a synthetic Multiview pedestrian detection dataset. It is build using pedestrian models from PersonX, in Unity. The MultiviewX dataset covers a square of 16 meters by 25 meters. The ground plane is quantized into a 640x1000 grid. There are 6 cameras with overlapping field-of-view in the MultiviewX dataset, each of which outputs a 1080x1920 resolution image. On average, 4.41 cameras are covering the same location.
21 PAPERS • 2 BENCHMARKS
Datasets for multi-view crowd counting in wide-area scenes. Includes our CityStreet dataset, as well as the counting and metadata for multi-view counting on PETS2009 and DukeMTMC. CityStreet is a real-world city scene dataset collected around the intersection of a crowded street. The scene size of the dataset is around 58m×72m. The ground plane map resolution is 320×384.
11 PAPERS • 1 BENCHMARK
CVCS is a synthetic multi-view people dataset, containing 31 scenes, where 23 are for training and the rest 8 for testing. The scene size varies from about 10m∗20m to 90m∗80m. Each scene contains 100 multi-view frames. The ground plane map resolution is 900×800, where each grid stands for 0.1 meters in the real world. In training, 5 views are randomly selected 5 times in each iteration per scene frame, and the same view number is randomly selected 21 times in evaluation.
10 PAPERS • 1 BENCHMARK
Home Action Genome is a large-scale multi-view video database of indoor daily activities. Every activity is captured by synchronized multi-view cameras, including an egocentric view. There are 30 hours of vides with 70 classes of daily activities and 453 classes of atomic actions.
9 PAPERS • 2 BENCHMARKS
Vehicle-to-Everything (V2X) network has enabled collaborative perception in autonomous driving, which is a promising solution to the fundamental defect of stand-alone intelligence including blind zones and long-range perception. However, the lack of datasets has severely blocked the development of collaborative perception algorithms. In this work, we release DOLPHINS: Dataset for cOllaborative Perception enabled Harmonious and INterconnected Self-driving, as a new simulated large-scale various-scenario multi-view multi-modality autonomous driving dataset, which provides a ground-breaking benchmark platform for interconnected autonomous driving. DOLPHINS outperforms current datasets in six dimensions: temporally-aligned images and point clouds from both vehicles and Road Side Units (RSUs) enabling both Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) based collaborative perception; 6 typical scenarios with dynamic weather conditions make the most various interconnected auton
7 PAPERS • NO BENCHMARKS YET
The GMVD dataset consists of synthetic scenes captured using the GTA-V and Unity graphics engines. The dataset covers a variety of scenes, along with different conditions including day time variations (morning, afternoon, evening, night) and weather variations (sunny, cloudy, rainy, snowy). The purpose of the dataset is twofold. The first is to benchmark the generalization capabilities of Multi-View Detection algorithms. The second purpose is to serve as a synthetic training source from which the trained models can be directly applied on real-world data.
3 PAPERS • 1 BENCHMARK
The RailEye3D dataset, a collection of train-platform scenarios for applications targeting passenger safety and automation of train dispatching, consists of 10 image sequences captured at 6 railway stations in Austria. Annotations for multi-object tracking are provided in both an unified format as well as the ground-truth format used in the MOTChallenge.
3 PAPERS • NO BENCHMARKS YET
The MultiviewC dataset mainly contributes to multiview cattle action recognition, 3D objection detection and tracking. We build a novel synthetic dataset MultiviewC through UE4 based on real cattle video dataset which is offered by CISRO. The format of our data set has been adjusted on the basis of MultiviewX for set-up, annotation and files structure.
2 PAPERS • NO BENCHMARKS YET
10000 instances of three-view numerical data set with 4 clusters and 2 feature components are considered. The data points in each view are generated from a 2-component 2-variate Gaussian mixture model (GMM) where their mixing proportions $\alpha_1^{(1)}=\alpha_1^{(2)}=\alpha_1^{(3)}=\alpha_1^{(4)}=0.3$; $\alpha_2^{(1)}=\alpha_2^{(2)}=\alpha_2^{(3)}=\alpha_2^{(4)}=0.15$; $\alpha_3^{(1)}=\alpha_3^{(2)}=\alpha_3^{(3)}=\alpha_3^{(4)}=0.15$ and $\alpha_4^{(1)}=\alpha_4^{(2)}=\alpha_4^{(3)}=\alpha_4^{(4)}=0.4$. The means $\mu_{ik}^{(1)}$ for the first view are $[-10 ~-5)]$,$[-9 ~ 11]$, $[0~ 6]$ and $[4~0]$; The means $\mu_{ik}^{(2)}$ for the view 2 are $[-8 ~-12]$,$[-6 ~ -3]$, $[-2~ 7]$ and $[2~1]$; And the means $\mu_{ik}^{(3)}$ for the third view are $[-5 ~-10]$,$[-8 ~ -1]$, $[0~ 5]$ and $[5~-4]$. The covariance matrices for the three views are $\Sigma_1^{(1)}=\Sigma_1^{(2)}=\Sigma_1^{(3)}=\Sigma_1^{(4)}=\left[ \begin{array}{cc} 1 & 0\0&1\end{array}\right]$; $\Sigma_2^{(1)}=\Si
1 PAPER • NO BENCHMARKS YET