VoxCeleb1 is an audio dataset containing over 100,000 utterances for 1,251 celebrities, extracted from videos uploaded to YouTube.
604 PAPERS • 9 BENCHMARKS
The Event-Camera Dataset is a collection of datasets with an event-based camera for high-speed robotics. The data also include intensity images, inertial measurements, and ground truth from a motion-capture system. An event-based camera is a revolutionary vision sensor with three key advantages: a measurement rate that is almost 1 million times faster than standard cameras, a latency of 1 microsecond, and a high dynamic range of 130 decibels (standard cameras only have 60 dB). These properties enable the design of a new class of algorithms for high-speed robotics, where standard cameras suffer from motion blur and high latency. All the data are released both as text files and binary (i.e., rosbag) files.
43 PAPERS • 2 BENCHMARKS
The Multi Vehicle Stereo Event Camera (MVSEC) dataset is a collection of data designed for the development of novel 3D perception algorithms for event based cameras. Stereo event data is collected from car, motorbike, hexacopter and handheld data, and fused with lidar, IMU, motion capture and GPS to provide ground truth pose and depth images.
25 PAPERS • 1 BENCHMARK
MGif is a dataset of videos containing movements of different cartoon animals. Each video is a moving gif file. The dataset consists of 1000 videos. The dataset is particularly challenging because of the high appearance variation and motion diversity.
13 PAPERS • 1 BENCHMARK
In order to create the TED-talks dataset, 3,035 YouTube videos were downloaded using the "TED talks" query. From these initial candidates, videos in which the upper part of the person is visible for at least 64 frames, and the height of the person bounding box was at least 384 pixels were selected. Static videos were manually filtered out and videos in which a person is doing something other than presenting.
10 PAPERS • 1 BENCHMARK
Thai-Chi-HD is a high resolution dataset which can be used as reference benchmark for evaluating frameworks for image animation and video generation. It consists of cropped videos of full human bodies performing Tai Chi actions.
9 PAPERS • 2 BENCHMARKS
SEN12MS-CR-TS is a multi-modal and multi-temporal data set for cloud removal. It contains time-series of paired and co-registered Sentinel-1 and cloudy as well as cloud-free Sentinel-2 data from European Space Agency's Copernicus mission. Each time series contains 30 cloudy and clear observations regularly sampled throughout the year 2018. Our multi-temporal data set is readily pre-processed and backward-compatible with SEN12MS-CR.
7 PAPERS • 1 BENCHMARK
To evaluate the performance on 4K burst images/video, we collect several clips from website. The dataset can be download from : https://drive.google.com/file/d/1YDljUONvyKUO24smTx__CUH_4Zxhle09/view?usp=sharing
1 PAPER • 1 BENCHMARK