Collected data from two distinct experiments in immersive, interactive VR where participants performed dynamic tasks as their eye, head, and hand movements were recorded. In the second experiment, a range of real-time privacy mechanisms are applied to eye gaze in real-time.
1 PAPER • NO BENCHMARKS YET
The ULI-RI dataset is generated using the Unreal Engine 4 to simulate various outdoor environments with 115 high-quality 3D human models. For each person identity, we controlled and quantitatively labeled the illumination intensity, view point (model z-rotation angle), and background to create 512 images. There are total 115 x 512 = 58880 images in the ULI-RI dataset.
Correlated Corrupted Dataset is an evaluation set that consists of realistic visible-infrared (V-I) corruptions allowing for models' corruption robustness evaluation. Initially proposed for multimodal person re-identification, our dataset can also be used for the evaluation of V-I cross-modal approaches. Corruptions of the visible modality are the twenty corruptions proposed by Chen & al. in the "Benchmarks for Corruption Invariant Person Re-identification" paper. Corruptions of the infrared modalities have been proposed in our paper, introducing 19 corruptions that respect the infrared modality encoding. In practice, for co-located visible-infrared cameras, weather-related corruptions should, for example, affect each camera. Also, blur-related corruption would likely occur in both visible and infrared cameras. This dataset tackles this aspect by considering the eventual correlations that may occur from one modality camera to another.
Uncorrelated Corrupted Dataset is an evaluation set that consists of realistic visible-infrared (V-I) corruptions allowing for models' corruption robustness evaluation. Initially proposed for multimodal person re-identification, our dataset can also be used for the evaluation of V-I cross-modal approaches. Corruptions of the visible modality are the twenty corruptions proposed by Chen & al. in the "Benchmarks for Corruption Invariant Person Re-identification" paper. Corruptions of the infrared modalities have been proposed in our paper, introducing 19 corruptions that respect the infrared modality encoding. In practice, the corruptions are applied randomly and independently to the visible and the infrared cameras, making it more suited to a not co-located camera setting.
Person re-ID matches persons across multiple non-overlapping cameras. Despite the increasing deployment of airborne platforms in surveillance, current existing person re-ID benchmarks' focus is on ground-ground matching and very limited efforts on aerial-aerial matching. We propose a new benchmark dataset - AG-ReID, which performs person re-ID matching in a new setting: across aerial and ground cameras. Our dataset contains 21,983 images of 388 identities and 15 soft attributes for each identity. The data was collected by a UAV flying at altitudes between 15 to 45 meters and a ground-based CCTV camera on a university campus. Our dataset presents a novel elevated-viewpoint challenge for person re-ID due to the significant difference in person appearance across these cameras.
2 PAPERS • 1 BENCHMARK
The 42Street dataset is based on a theater play as an example of such an application. The dataset is created using a public recording of the 42Street theatre play [42street]. The play is 1.5 hours long and was split into 5 equally long parts of 20 minutes each, with various clothes changes between the different parts.
Multi-view Extended Videos with Identities dataset (MEVID) is a dataset for large-scale, video person re-identification (ReID) in the wild. It spans an extensive indoor and outdoor environment across nine unique dates in a 73-day window, various camera viewpoints, and entity clothing changes. Specifically, it contains labels of the identities of 158 unique people wearing 598 outfits taken from 8, 092 tracklets, average length of about 590 frames, seen in 33 camera views from the very large-scale MEVA person activities dataset.
3 PAPERS • NO BENCHMARKS YET
The ClonedPerson dataset is a large-scale synthetic person re-identification dataset introduced in the paper "Cloning Outfits from Real-World Images to 3D Characters for Generalizable Person Re-Identification" in CVPR 2022. It is generated by MakeHuman and Unity3D. Characters in this dataset use an automatic approach to directly clone the whole outfits from real-world person images to virtual 3D characters, such that any virtual person thus created will appear very similar to its real-world counterpart. The dataset contains 887,766 synthesized person images of 5,621 identities.
4 PAPERS • 4 BENCHMARKS
CUHK03-C is an evaluation set that consists of algorithmically generated corruptions applied to the CUHK03 test-set. These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.
9 PAPERS • 1 BENCHMARK
MSMT17-C is an evaluation set that consists of algorithmically generated corruptions applied to the MSMT17 test-set. These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.
5 PAPERS • 1 BENCHMARK
Market-1501-C is an evaluation set that consists of algorithmically generated corruptions applied to the Market-1501 test-set. These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.
22 PAPERS • 1 BENCHMARK
RegDB-C is an evaluation set that consists of algorithmically generated corruptions applied to the RegDB test-set (color images). These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.
4 PAPERS • 1 BENCHMARK
SYSU-MM01-C is an evaluation set that consists of algorithmically generated corruptions applied to the SYSU-MM01 test-set. These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.
RSTPReid contains 20505 images of 4,101 persons from 15 cameras. Each person has 5 corresponding images taken by different cameras with complex both indoor and outdoor scene transformations and backgrounds in various periods of time, which makes RSTPReid much more challenging and more adaptable to real scenarios. Each image is annotated with 2 textual descriptions. For data division, 3701 (index < 18505), 200 (18505 <= index < 19505) and 200 (index >= 19505) identities are utilized for training, validation and testing, respectively (Marked by item 'split' in the JSON file). Each sentence is no shorter than 23 words.
33 PAPERS • 1 BENCHMARK
One large-scale database for Text-to-Image Person Re-identification, i.e., Text-based Person Retrieval.
11 PAPERS • 2 BENCHMARKS
UAV-Human is a large dataset for human behavior understanding with UAVs. It contains 67,428 multi-modal video sequences and 119 subjects for action recognition, 22,476 frames for pose estimation, 41,290 frames and 1,144 identities for person re-identification, and 22,263 frames for attribute recognition. The dataset was collected by a flying UAV in multiple urban and rural districts in both daytime and nighttime over three months, hence covering extensive diversities w.r.t subjects, backgrounds, illuminations, weathers, occlusions, camera motions, and UAV flying attitudes. This dataset can be used for UAV-based human behavior understanding, including action recognition, pose estimation, re-identification, and attribute recognition.
38 PAPERS • 5 BENCHMARKS
LReID is a benchmark for lifelong person reidentification. It has been built using existing datasets, and it consists of two subsets: LReID-Seen and LReID-Unseen.
5 PAPERS • NO BENCHMARKS YET
The eSports Sensors dataset contains sensor data collected from 10 players in 22 matches in League of Legends. The sensor data collected includes:
4 PAPERS • 2 BENCHMARKS
LTCC contains 17,119 person images of 152 identities, and each identity is captured by at least two cameras. The dataset can be divided into two subsets: one cloth-change set where 91 persons appear with 416 different sets of outfits in 14,783 images, and one cloth-consistent subset containing the remaining 61 identities with 2,336 images without outfit changes. On average, there are 5 different clothes for each cloth-changing person, with the number of outfit changes ranging from 2 to 14.
26 PAPERS • 2 BENCHMARKS
This dataset consists of 33698 images from 221 identities. Each person in Cameras A and B is wearing the same clothes, but the images are captured in different rooms. For Camera C, the person wears different clothes, and the images are captured in a different day.
31 PAPERS • 2 BENCHMARKS
RegDB is used for Visible-Infrared Re-ID which handles the cross-modality matching between the daytime visible and night-time infrared images. The dataset contains images of 412 people. It includes 10 color and 10 thermal images for each person.
52 PAPERS • 2 BENCHMARKS
Occluded-DukeMTMC contains 15,618 training images, 17,661 gallery images, and 2,210 occluded query images. The experiment results on Occluded-DukeMTMC will demonstrate the superiority of our method in Occluded Person Re-ID problems, let alone that our method does not need any manually cropping procedure as pre-process.
26 PAPERS • 1 BENCHMARK
MVB (Multi View Baggage) is a dataset for baggage ReID task which has some essential differences from person ReID. The features of MVB are three-fold. First, MVB is the first publicly released large-scale dataset that contains 4519 baggage identities and 22660 annotated baggage images as well as its surface material labels. Second, all baggage images are captured by specially-designed multi-view camera system to handle pose variation and occlusion, in order to obtain the 3D information of baggage surface as complete as possible. Third, MVB has remarkable inter-class similarity and intra-class dissimilarity, considering the fact that baggage might have very similar appearance while the data is collected in two real airport environments, where imaging factors varies significantly from each other.
The Airport dataset is a dataset for person re-identification which consists of 39,902 images and 9,651 identities across six cameras.
8 PAPERS • NO BENCHMARKS YET
Veri-Wild is the largest vehicle re-identification dataset (as of CVPR 2019). The dataset is captured from a large CCTV surveillance system consisting of 174 cameras across one month (30× 24h) under unconstrained scenarios. This dataset comprises 416,314 vehicle images of 40,671 identities. Evaluation on this dataset is split across three subsets: small, medium and large; comprising 3000, 5000 and 10,000 identities respectively (in probe and gallery sets).
38 PAPERS • 3 BENCHMARKS
iQIYI-VID dataset, which comprises video clips from iQIYI variety shows, films, and television dramas. The whole dataset contains 500,000 videos clips of 5,000 celebrities. The length of each video is 1~30 seconds.
6 PAPERS • NO BENCHMARKS YET
The DukeMTMC-VideoReID (Duke Multi-Tracking Multi-Camera Video-based ReIDentification) dataset is a subset of the DukeMTMC for video-based person re-ID. The dataset is created from high-resolution videos from 8 different cameras. It is one of the largest pedestrian video datasets wherein images are cropped by hand-drawn bounding boxes. The dataset consists 4832 tracklets of 1812 identities in total, and each tracklet has 168 frames on average.
48 PAPERS • 2 BENCHMARKS
P-DukeMTMC-reID is a modified version based on DukeMTMC-reID dataset. There are 12,927 images (665 identifies) in training set, 2,163 images (634 identities) for querying and 9,053 images in the gallery set.
11 PAPERS • 1 BENCHMARK
Occluded REID is an occluded person dataset captured by mobile cameras, consisting of 2,000 images of 200 occluded persons (see Fig. (c)). Each identity has 5 full-body person images and 5 occluded person images with different types of occlusion.
59 PAPERS • 1 BENCHMARK
Labeled Pedestrian in the Wild (LPW) is a pedestrian detection dataset that contains 2,731 pedestrians in three different scenes where each annotated identity is captured by from 2 to 4 cameras. The LPW features a notable scale of 7,694 tracklets with over 590,000 images as well as the cleanliness of its tracklets. It distinguishes from existing datasets in three aspects: large scale with cleanliness, automatically detected bounding boxes and far more crowded scenes with greater age span. This dataset provides a more realistic and challenging benchmark, which facilitates the further exploration of more powerful algorithms.
MSMT17 is a multi-scene multi-time person re-identification dataset. The dataset consists of 180 hours of videos, captured by 12 outdoor cameras, 3 indoor cameras, and during 12 time slots. The videos cover a long period of time and present complex lighting variations, and it contains a large number of annotated identities, i.e., 4,101 identities and 126,441 bounding boxes.
237 PAPERS • 6 BENCHMARKS
SenseReID is a person re-identification dataset for evaluating ReID models. It is captured from real surveillance cameras and the person bounding boxes are obtained from state-of-the-art detection algorithm. The dataset contains 1,717 identities in total.
7 PAPERS • 1 BENCHMARK
The CUKL-SYSY dataset is a large scale benchmark for person search, containing 18,184 images and 8,432 identities. Different from previous re-id benchmarks, matching query persons with manually cropped pedestrians, this dataset is much closer to real application scenarios by searching person from whole images in the gallery.
92 PAPERS • 2 BENCHMARKS
The images in DukeMTMC-attribute dataset comes from Duke University. There are 1812 identities and 34183 annotated bounding boxes in the DukeMTMC-attribute dataset. This dataset contains 702 identities for training and 1110 identities for testing, corresponding to 16522 and 17661 images respectively. The attributes are annotated in the identity level, every image in this dataset is annotated with 23 attributes.
The Market1501-Attributes dataset is built from the Market1501 dataset. Market1501 Attribute is an augmentation of this dataset with 28 hand annotated attributes, such as gender, age, sleeve length, flags for items carried as well as upper clothes colors and lower clothes colors.
The SYSU-MM01 is a dataset collected for the Visible-Infrared Re-identification problem. The images in the dataset were obtained from 491 different persons by recording them using 4 RGB and 2 infrared cameras. Within the dataset, the persons are divided into 3 fixed splits to create training, validation and test sets. In the training set, there are 20284 RGB and 9929 infrared images of 296 persons. The validation set contains 1974 RGB and 1980 infrared images of 99 persons. The testing set consists of the images of 96 persons where 3803 infrared images are used as query and 301 randomly selected RGB images are used as gallery.
89 PAPERS • 2 BENCHMARKS
The DukeMTMC-reID (Duke Multi-Tracking Multi-Camera ReIDentification) dataset is a subset of the DukeMTMC for image-based person re-ID. The dataset is created from high-resolution videos from 8 different cameras. It is one of the largest pedestrian image datasets wherein images are cropped by hand-drawn bounding boxes. The dataset consists 16,522 training images of 702 identities, 2,228 query images of the other 702 identities and 17,661 gallery images.
326 PAPERS • 6 BENCHMARKS
PRW is a large-scale dataset for end-to-end pedestrian detection and person recognition in raw video frames. PRW is introduced to evaluate Person Re-identification in the Wild, using videos acquired through six synchronized cameras. It contains 932 identities and 11,816 frames in which pedestrians are annotated with their bounding box positions and identities.
69 PAPERS • NO BENCHMARKS YET
MARS (Motion Analysis and Re-identification Set) is a large scale video based person reidentification dataset, an extension of the Market-1501 dataset. It has been collected from six near-synchronized cameras. It consists of 1,261 different pedestrians, who are captured by at least 2 cameras. The variations in poses, colors and illuminations of pedestrians, as well as the poor image quality, make it very difficult to yield high matching accuracy. Moreover, the dataset contains 3,248 distractors in order to make it more realistic. Deformable Part Model and GMMCP tracker were used to automatically generate the tracklets (mostly 25-50 frames long).
169 PAPERS • 2 BENCHMARKS
Market-1501 is a large-scale public benchmark dataset for person re-identification. It contains 1501 identities which are captured by six different cameras, and 32,668 pedestrian image bounding-boxes obtained using the Deformable Part Models pedestrian detector. Each person has 3.6 images on average at each viewpoint. The dataset is split into two parts: 750 identities are utilized for training and the remaining 751 identities are used for testing. In the official testing protocol 3,368 query images are selected as probe set to find the correct match across 19,732 reference gallery images.
812 PAPERS • 9 BENCHMARKS
Partial REID is a specially designed partial person reidentification dataset that includes 600 images from 60 people, with 5 full-body images and 5 occluded images per person. These images were collected on a university campus by 6 cameras from different viewpoints, backgrounds and different types of occlusion. The examples of partial persons in the Partial REID dataset are shown in the Figure.
39 PAPERS • NO BENCHMARKS YET
The CUHK03 consists of 14,097 images of 1,467 different identities, where 6 campus cameras were deployed for image collection and each identity is captured by 2 campus cameras. This dataset provides two types of annotations, one by manually labelled bounding boxes and the other by bounding boxes produced by an automatic detector. The dataset also provides 20 random train/test splits in which 100 identities are selected for testing and the rest for training
398 PAPERS • 8 BENCHMARKS
The OpeReid dataset is a person re-identification dataset that consists of 7,413 images of 200 persons.
PRID 2011 is a person reidentification dataset that provides multiple person trajectories recorded from two different static surveillance cameras, monitoring crosswalks and sidewalks. The dataset shows a clean background, and the people in the dataset are rarely occluded. In the dataset, 200 people appear in both views. Among the 200 people, 178 people have more than 20 appearances
18 PAPERS • 2 BENCHMARKS
Partial iLIDS is a dataset for occluded person person re-identification. It contains a total of 476 images of 119 people captured by 4 non-overlapping cameras. Some images contain people occluded by other individuals or luggage.
12 PAPERS • NO BENCHMARKS YET
The iLIDS-VID dataset is a person re-identification dataset which involves 300 different pedestrians observed across two disjoint camera views in public open space. It comprises 600 image sequences of 300 distinct individuals, with one pair of image sequences from two camera views for each person. Each image sequence has variable length ranging from 23 to 192 image frames, with an average number of 73. The iLIDS-VID dataset is very challenging due to clothing similarities among people, lighting and viewpoint variations across camera views, cluttered background and random occlusions.
19 PAPERS • 2 BENCHMARKS
ETH is a dataset for pedestrian detection. The testing set contains 1,804 images in three video clips. The dataset is captured from a stereo rig mounted on car, with a resolution of 640 x 480 (bayered), and a framerate of 13--14 FPS.
59 PAPERS • 5 BENCHMARKS
This dataset contains 971 identities from two disjoint camera views. Each identity has two samples per camera view. It is used for Person Re-identification.