The DukeMTMC-reID (Duke Multi-Tracking Multi-Camera ReIDentification) dataset is a subset of the DukeMTMC for image-based person re-ID. The dataset is created from high-resolution videos from 8 different cameras. It is one of the largest pedestrian image datasets wherein images are cropped by hand-drawn bounding boxes. The dataset consists 16,522 training images of 702 identities, 2,228 query images of the other 702 identities and 17,661 gallery images.
333 PAPERS • 6 BENCHMARKS
MPI (Max Planck Institute) Sintel is a dataset for optical flow evaluation that has 1064 synthesized stereo images and ground truth data for disparity. Sintel is derived from open-source 3D animated short film Sintel. The dataset has 23 different scenes. The stereo images are RGB while the disparity is grayscale. Both have resolution of 1024×436 pixels and 8-bit per channel.
189 PAPERS • 6 BENCHMARKS
Grammarly’s Yahoo Answers Formality Corpus (GYAFC) is the largest dataset for any style containing a total of 110K informal / formal sentence pairs.
99 PAPERS • 3 BENCHMARKS
WikiArt contains painting from 195 different artists. The dataset has 42129 images for training and 10628 images for testing.
83 PAPERS • 2 BENCHMARKS
MPIIGaze is a dataset for appearance-based gaze estimation in the wild. It contains 213,659 images collected from 15 participants during natural everyday laptop use over more than three months. It has a large variability in appearance and illumination.
77 PAPERS • 1 BENCHMARK
POP909 is a dataset which contains multiple versions of the piano arrangements of 909 popular songs created by professional musicians. The main body of the dataset contains the vocal melody, the lead instrument melody, and the piano accompaniment for each song in MIDI format, which are aligned to the original audio files. Furthermore, annotations are provided of tempo, beat, key, and chords, where the tempo curves are hand-labelled and others are done by MIR algorithms.
36 PAPERS • NO BENCHMARKS YET
The iKala dataset is a singing voice separation dataset that comprises of 252 30-second excerpts sampled from 206 iKala songs (plus 100 hidden excerpts reserved for MIREX data mining contest). The music accompaniment and the singing voice are recorded at the left and right channels respectively. Additionally, the human-labeled pitch contours and timestamped lyrics are provided.
20 PAPERS • 1 BENCHMARK
Touchdown is a corpus for executing navigation instructions and resolving spatial descriptions in visual real-world environments. The task is to follow instruction to a goal position and there find a hidden object, Touchdown the bear.
19 PAPERS • 1 BENCHMARK
An annotated image memorability dataset to date (with 60,000 labeled images from a diverse array of sources).
17 PAPERS • NO BENCHMARKS YET
TextSeg is a large-scale fine-annotated and multi-purpose text detection and segmentation dataset, collecting scene and design text with six types of annotations: word- and character-wise bounding polygons, masks and transcriptions.
15 PAPERS • 1 BENCHMARK
A new dataset of handwritten text with fine-grained annotations at the character level and report results from an initial user evaluation.
9 PAPERS • NO BENCHMARKS YET
To comprehensively evaluate the effectiveness and generalization ability of style transfer methods, we build StyleBench that covers 73 distinct styles, ranging from paintings, flat illustrations, 3D rendering to sculptures with varying materials. For each style, we collect 5-7 distinct images with variations. In total, our StyleBench contains 490 images across diverse styles.
7 PAPERS • 1 BENCHMARK
We construct a style-balanced dataset, called StyleGallery, covering several open source datasets. Specifically, StyleGallery includes JourneyDB, a dataset comprising a broad spectrum of diverse styles derived from MidJourney, and WIKIART, with extensive fine-grained painting styles, such as pointillism and ink drawing, and a subset of stylized images from LAION-Aesthetics.
Aims to facilitate research in caricature recognition. All the caricatures and face images were collected from the Web. Compared with two existing datasets, this dataset is much more challenging, with a much greater number of available images, artistic styles and larger intra-personal variations.
6 PAPERS • NO BENCHMARKS YET
PASTEL is a parallelly annotated stylistic language dataset. The dataset consists of ~41K parallel sentences and 8.3K parallel stories annotated across different personas.
4 PAPERS • NO BENCHMARKS YET
The scarcity of high-quality and multi-task singing datasets significantly hinders the development of diverse controllable and personalized singing tasks, as existing singing datasets suffer from low quality, limited diversity of languages and singers, absence of multi-technique information and realistic music scores, and poor task suitability. To tackle these problems, we present GTSinger, a large Global, multi-Technique, free-to-use, high-quality singing corpus with realistic music scores, designed for all singing tasks, along with its benchmarks. Particularly, (1) we collect 80.59 hours of high-quality songs, forming the largest recorded singing dataset; (2) 20 professional singers across nine languages offer diverse timbres and styles; (3) we provide controlled comparison and phoneme-level annotations of six singing techniques, helping technique modeling and control; (4) GTSinger offers realistic music scores, assisting real-world musical composition; (5) singing voices are accompa
2 PAPERS • NO BENCHMARKS YET
A new text effects dataset with 141,081 text effect/glyph pairs in total. The dataset consists of 152 professionally designed text effects rendered on glyphs, including English letters, Chinese characters, and Arabic numerals.
The Chinese Traditional Painting dataset for style transfer contains 1000 content images and 100 style images. The content images are mostly the photorealistic scenes of mountain, lake, river, bridge, and buildings in regions south of the Yangtze River. It includes not only the scenes of China, but also beautiful pictures of Rhine, Alps, Yellow Stone, Grand Canyon, etc. The content images include diverse types of Chinese traditional paintings.
1 PAPER • NO BENCHMARKS YET
Gutenberg Poem Dataset is used for the next verse prediction component.
Image dataset with about 600K Flickr photos.