Market-1501 is a large-scale public benchmark dataset for person re-identification. It contains 1501 identities which are captured by six different cameras, and 32,668 pedestrian image bounding-boxes obtained using the Deformable Part Models pedestrian detector. Each person has 3.6 images on average at each viewpoint. The dataset is split into two parts: 750 identities are utilized for training and the remaining 751 identities are used for testing. In the official testing protocol 3,368 query images are selected as probe set to find the correct match across 19,732 reference gallery images.
812 PAPERS • 9 BENCHMARKS
The CUHK03 consists of 14,097 images of 1,467 different identities, where 6 campus cameras were deployed for image collection and each identity is captured by 2 campus cameras. This dataset provides two types of annotations, one by manually labelled bounding boxes and the other by bounding boxes produced by an automatic detector. The dataset also provides 20 random train/test splits in which 100 identities are selected for testing and the rest for training
398 PAPERS • 8 BENCHMARKS
The DukeMTMC-reID (Duke Multi-Tracking Multi-Camera ReIDentification) dataset is a subset of the DukeMTMC for image-based person re-ID. The dataset is created from high-resolution videos from 8 different cameras. It is one of the largest pedestrian image datasets wherein images are cropped by hand-drawn bounding boxes. The dataset consists 16,522 training images of 702 identities, 2,228 query images of the other 702 identities and 17,661 gallery images.
326 PAPERS • 6 BENCHMARKS
The Viewpoint Invariant Pedestrian Recognition (VIPeR) dataset includes 632 people and two outdoor cameras under different viewpoints and light conditions. Each person has one image per camera and each image has been scaled to be 128×48 pixels. It provides the pose angle of each person as 0° (front), 45°, 90° (right), 135°, and 180° (back).
132 PAPERS • NO BENCHMARKS YET
The CUKL-SYSY dataset is a large scale benchmark for person search, containing 18,184 images and 8,432 identities. Different from previous re-id benchmarks, matching query persons with manually cropped pedestrians, this dataset is much closer to real application scenarios by searching person from whole images in the gallery.
92 PAPERS • 2 BENCHMARKS
The SYSU-MM01 is a dataset collected for the Visible-Infrared Re-identification problem. The images in the dataset were obtained from 491 different persons by recording them using 4 RGB and 2 infrared cameras. Within the dataset, the persons are divided into 3 fixed splits to create training, validation and test sets. In the training set, there are 20284 RGB and 9929 infrared images of 296 persons. The validation set contains 1974 RGB and 1980 infrared images of 99 persons. The testing set consists of the images of 96 persons where 3803 infrared images are used as query and 301 randomly selected RGB images are used as gallery.
88 PAPERS • 2 BENCHMARKS
ETH is a dataset for pedestrian detection. The testing set contains 1,804 images in three video clips. The dataset is captured from a stereo rig mounted on car, with a resolution of 640 x 480 (bayered), and a framerate of 13--14 FPS.
59 PAPERS • 5 BENCHMARKS
Occluded REID is an occluded person dataset captured by mobile cameras, consisting of 2,000 images of 200 occluded persons (see Fig. (c)). Each identity has 5 full-body person images and 5 occluded person images with different types of occlusion.
59 PAPERS • 1 BENCHMARK
CityFlow is a city-scale traffic camera dataset consisting of more than 3 hours of synchronized HD videos from 40 cameras across 10 intersections, with the longest distance between two simultaneous cameras being 2.5 km. The dataset contains more than 200K annotated bounding boxes covering a wide range of scenes, viewing angles, vehicle models, and urban traffic flow conditions.
43 PAPERS • 2 BENCHMARKS
Partial REID is a specially designed partial person reidentification dataset that includes 600 images from 60 people, with 5 full-body images and 5 occluded images per person. These images were collected on a university campus by 6 cameras from different viewpoints, backgrounds and different types of occlusion. The examples of partial persons in the Partial REID dataset are shown in the Figure.
39 PAPERS • NO BENCHMARKS YET
Veri-Wild is the largest vehicle re-identification dataset (as of CVPR 2019). The dataset is captured from a large CCTV surveillance system consisting of 174 cameras across one month (30× 24h) under unconstrained scenarios. This dataset comprises 416,314 vehicle images of 40,671 identities. Evaluation on this dataset is split across three subsets: small, medium and large; comprising 3000, 5000 and 10,000 identities respectively (in probe and gallery sets).
38 PAPERS • 3 BENCHMARKS
RSTPReid contains 20505 images of 4,101 persons from 15 cameras. Each person has 5 corresponding images taken by different cameras with complex both indoor and outdoor scene transformations and backgrounds in various periods of time, which makes RSTPReid much more challenging and more adaptable to real scenarios. Each image is annotated with 2 textual descriptions. For data division, 3701 (index < 18505), 200 (18505 <= index < 19505) and 200 (index >= 19505) identities are utilized for training, validation and testing, respectively (Marked by item 'split' in the JSON file). Each sentence is no shorter than 23 words.
33 PAPERS • 1 BENCHMARK
Occluded-DukeMTMC contains 15,618 training images, 17,661 gallery images, and 2,210 occluded query images. The experiment results on Occluded-DukeMTMC will demonstrate the superiority of our method in Occluded Person Re-ID problems, let alone that our method does not need any manually cropping procedure as pre-process.
26 PAPERS • 1 BENCHMARK
Market-1501-C is an evaluation set that consists of algorithmically generated corruptions applied to the Market-1501 test-set. These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.
22 PAPERS • 1 BENCHMARK
The iLIDS-VID dataset is a person re-identification dataset which involves 300 different pedestrians observed across two disjoint camera views in public open space. It comprises 600 image sequences of 300 distinct individuals, with one pair of image sequences from two camera views for each person. Each image sequence has variable length ranging from 23 to 192 image frames, with an average number of 73. The iLIDS-VID dataset is very challenging due to clothing similarities among people, lighting and viewpoint variations across camera views, cluttered background and random occlusions.
19 PAPERS • 2 BENCHMARKS
SYSU-30k contains 30k categories of persons, which is about 20 times larger than CUHK03 (1.3k categories) and Market1501 (1.5k categories), and 30 times larger than ImageNet (1k categories). SYSU-30k contains 29,606,918 images. Moreover, SYSU-30k provides not only a large platform for the weakly supervised ReID problem but also a more challenging test set that is consistent with the realistic setting for standard evaluation.
14 PAPERS • 2 BENCHMARKS
Partial iLIDS is a dataset for occluded person person re-identification. It contains a total of 476 images of 119 people captured by 4 non-overlapping cameras. Some images contain people occluded by other individuals or luggage.
12 PAPERS • NO BENCHMARKS YET
One large-scale database for Text-to-Image Person Re-identification, i.e., Text-based Person Retrieval.
11 PAPERS • 2 BENCHMARKS
P-DukeMTMC-reID is a modified version based on DukeMTMC-reID dataset. There are 12,927 images (665 identifies) in training set, 2,163 images (634 identities) for querying and 9,053 images in the gallery set.
11 PAPERS • 1 BENCHMARK
CUHK03-C is an evaluation set that consists of algorithmically generated corruptions applied to the CUHK03 test-set. These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.
9 PAPERS • 1 BENCHMARK
The Airport dataset is a dataset for person re-identification which consists of 39,902 images and 9,651 identities across six cameras.
8 PAPERS • NO BENCHMARKS YET
Labeled Pedestrian in the Wild (LPW) is a pedestrian detection dataset that contains 2,731 pedestrians in three different scenes where each annotated identity is captured by from 2 to 4 cameras. The LPW features a notable scale of 7,694 tracklets with over 590,000 images as well as the cleanliness of its tracklets. It distinguishes from existing datasets in three aspects: large scale with cleanliness, automatically detected bounding boxes and far more crowded scenes with greater age span. This dataset provides a more realistic and challenging benchmark, which facilitates the further exploration of more powerful algorithms.
SenseReID is a person re-identification dataset for evaluating ReID models. It is captured from real surveillance cameras and the person bounding boxes are obtained from state-of-the-art detection algorithm. The dataset contains 1,717 identities in total.
7 PAPERS • 1 BENCHMARK
LReID is a benchmark for lifelong person reidentification. It has been built using existing datasets, and it consists of two subsets: LReID-Seen and LReID-Unseen.
5 PAPERS • NO BENCHMARKS YET
MSMT17-C is an evaluation set that consists of algorithmically generated corruptions applied to the MSMT17 test-set. These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.
5 PAPERS • 1 BENCHMARK
This dataset contains 114 individuals including 1824 images captured from two disjoint camera views. For each person, eight images are captured from eight different orientations under one camera view and are normalized to 128x48 pixels. This dataset is also split into two parts randomly. One contains 57 individuals for training, and the other contains 57 individuals for testing.
4 PAPERS • 1 BENCHMARK
RegDB-C is an evaluation set that consists of algorithmically generated corruptions applied to the RegDB test-set (color images). These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.
SYSU-MM01-C is an evaluation set that consists of algorithmically generated corruptions applied to the SYSU-MM01 test-set. These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.
The images in DukeMTMC-attribute dataset comes from Duke University. There are 1812 identities and 34183 annotated bounding boxes in the DukeMTMC-attribute dataset. This dataset contains 702 identities for training and 1110 identities for testing, corresponding to 16522 and 17661 images respectively. The attributes are annotated in the identity level, every image in this dataset is annotated with 23 attributes.
3 PAPERS • NO BENCHMARKS YET
The Market1501-Attributes dataset is built from the Market1501 dataset. Market1501 Attribute is an augmentation of this dataset with 28 hand annotated attributes, such as gender, age, sleeve length, flags for items carried as well as upper clothes colors and lower clothes colors.
The OpeReid dataset is a person re-identification dataset that consists of 7,413 images of 200 persons.
This dataset contains 1203 individuals captured from two disjoint camera views. To each person, one to twelve images are captured from one to six different orientations under one camera view and are normalized to 128x64 pixels. This dataset is constructed based on the Market-1501 benchmark data and the orientation label for each image has been manually annotated.
2 PAPERS • NO BENCHMARKS YET
This dataset contains 971 identities from two disjoint camera views. Each identity has two samples per camera view. It is used for Person Re-identification.
1 PAPER • NO BENCHMARKS YET
MVB (Multi View Baggage) is a dataset for baggage ReID task which has some essential differences from person ReID. The features of MVB are three-fold. First, MVB is the first publicly released large-scale dataset that contains 4519 baggage identities and 22660 annotated baggage images as well as its surface material labels. Second, all baggage images are captured by specially-designed multi-view camera system to handle pose variation and occlusion, in order to obtain the 3D information of baggage surface as complete as possible. Third, MVB has remarkable inter-class similarity and intra-class dissimilarity, considering the fact that baggage might have very similar appearance while the data is collected in two real airport environments, where imaging factors varies significantly from each other.
The ULI-RI dataset is generated using the Unreal Engine 4 to simulate various outdoor environments with 115 high-quality 3D human models. For each person identity, we controlled and quantitatively labeled the illumination intensity, view point (model z-rotation angle), and background to create 512 images. There are total 115 x 512 = 58880 images in the ULI-RI dataset.
The X-MARS dataset proposes new splits for the MARS dataset, to allow for cross-evaluation with the Market-1501 dataset without training and test overlap between the two datasets.