MSMT17 is a multi-scene multi-time person re-identification dataset. The dataset consists of 180 hours of videos, captured by 12 outdoor cameras, 3 indoor cameras, and during 12 time slots. The videos cover a long period of time and present complex lighting variations, and it contains a large number of annotated identities, i.e., 4,101 identities and 126,441 bounding boxes.
238 PAPERS • 6 BENCHMARKS
MARS (Motion Analysis and Re-identification Set) is a large scale video based person reidentification dataset, an extension of the Market-1501 dataset. It has been collected from six near-synchronized cameras. It consists of 1,261 different pedestrians, who are captured by at least 2 cameras. The variations in poses, colors and illuminations of pedestrians, as well as the poor image quality, make it very difficult to yield high matching accuracy. Moreover, the dataset contains 3,248 distractors in order to make it more realistic. Deformable Part Model and GMMCP tracker were used to automatically generate the tracklets (mostly 25-50 frames long).
170 PAPERS • 2 BENCHMARKS
The CUKL-SYSY dataset is a large scale benchmark for person search, containing 18,184 images and 8,432 identities. Different from previous re-id benchmarks, matching query persons with manually cropped pedestrians, this dataset is much closer to real application scenarios by searching person from whole images in the gallery.
92 PAPERS • 2 BENCHMARKS
PRW is a large-scale dataset for end-to-end pedestrian detection and person recognition in raw video frames. PRW is introduced to evaluate Person Re-identification in the Wild, using videos acquired through six synchronized cameras. It contains 932 identities and 11,816 frames in which pedestrians are annotated with their bounding box positions and identities.
69 PAPERS • NO BENCHMARKS YET
ETH is a dataset for pedestrian detection. The testing set contains 1,804 images in three video clips. The dataset is captured from a stereo rig mounted on car, with a resolution of 640 x 480 (bayered), and a framerate of 13--14 FPS.
59 PAPERS • 5 BENCHMARKS
A novel large-scale corpus of manual annotations for the SoccerNet video dataset, along with open challenges to encourage more research in soccer understanding and broadcast production.
45 PAPERS • 7 BENCHMARKS
CityFlow is a city-scale traffic camera dataset consisting of more than 3 hours of synchronized HD videos from 40 cameras across 10 intersections, with the longest distance between two simultaneous cameras being 2.5 km. The dataset contains more than 200K annotated bounding boxes covering a wide range of scenes, viewing angles, vehicle models, and urban traffic flow conditions.
43 PAPERS • 2 BENCHMARKS
The iLIDS-VID dataset is a person re-identification dataset which involves 300 different pedestrians observed across two disjoint camera views in public open space. It comprises 600 image sequences of 300 distinct individuals, with one pair of image sequences from two camera views for each person. Each image sequence has variable length ranging from 23 to 192 image frames, with an average number of 73. The iLIDS-VID dataset is very challenging due to clothing similarities among people, lighting and viewpoint variations across camera views, cluttered background and random occlusions.
19 PAPERS • 2 BENCHMARKS
iQIYI-VID dataset, which comprises video clips from iQIYI variety shows, films, and television dramas. The whole dataset contains 500,000 videos clips of 5,000 celebrities. The length of each video is 1~30 seconds.
6 PAPERS • NO BENCHMARKS YET
The images in DukeMTMC-attribute dataset comes from Duke University. There are 1812 identities and 34183 annotated bounding boxes in the DukeMTMC-attribute dataset. This dataset contains 702 identities for training and 1110 identities for testing, corresponding to 16522 and 17661 images respectively. The attributes are annotated in the identity level, every image in this dataset is annotated with 23 attributes.
3 PAPERS • NO BENCHMARKS YET
Multi-view Extended Videos with Identities dataset (MEVID) is a dataset for large-scale, video person re-identification (ReID) in the wild. It spans an extensive indoor and outdoor environment across nine unique dates in a 73-day window, various camera viewpoints, and entity clothing changes. Specifically, it contains labels of the identities of 158 unique people wearing 598 outfits taken from 8, 092 tracklets, average length of about 590 frames, seen in 33 camera views from the very large-scale MEVA person activities dataset.
The 42Street dataset is based on a theater play as an example of such an application. The dataset is created using a public recording of the 42Street theatre play [42street]. The play is 1.5 hours long and was split into 5 equally long parts of 20 minutes each, with various clothes changes between the different parts.
1 PAPER • NO BENCHMARKS YET