UCF101 dataset is an extension of UCF50 and consists of 13,320 video clips, which are classified into 101 categories. These 101 categories can be classified into 5 types (Body motion, Human-human interactions, Human-object interactions, Playing musical instruments and Sports). The total length of these video clips is over 27 hours. All the videos are collected from YouTube and have a fixed frame rate of 25 FPS with the resolution of 320 × 240.
1,252 PAPERS • 20 BENCHMARKS
The Densely Annotation Video Segmentation dataset (DAVIS) is a high quality and high resolution densely annotated video segmentation dataset under two resolutions, 480p and 1080p. There are 50 video sequences with 3455 densely annotated frames in pixel level. 30 videos with 2079 frames are for training and 20 videos with 1376 frames are for validation.
460 PAPERS • 10 BENCHMARKS
The Office dataset contains 31 object categories in three domains: Amazon, DSLR and Webcam. The 31 categories in the dataset consist of objects commonly encountered in office settings, such as keyboards, file cabinets, and laptops. The Amazon domain contains on average 90 images per class and 2817 images in total. As these images were captured from a website of online merchants, they are captured against clean background and at a unified scale. The DSLR domain contains 498 low-noise high resolution images (4288×2848). There are 5 objects per category. Each object was captured from different viewpoints on average 3 times. For Webcam, the 795 images of low resolution (640×480) exhibit significant noise and color as well as white balance artifacts.
456 PAPERS • 9 BENCHMARKS
The Middlebury Stereo dataset consists of high-resolution stereo sequences with complex geometry and pixel-accurate ground-truth disparity data. The ground-truth disparities are acquired using a novel technique that employs structured lighting and does not require the calibration of the light projectors.
185 PAPERS • 5 BENCHMARKS
The Vimeo-90K is a large-scale high-quality video dataset for lower-level video processing. It proposes three different video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution.
132 PAPERS • 2 BENCHMARKS
Description: 10,000 People - Human Pose Recognition Data. This dataset includes indoor and outdoor scenes.This dataset covers males and females. Age distribution ranges from teenager to the elderly, the middle-aged and young people are the majorities. The data diversity includes different shooting heights, different ages, different light conditions, different collecting environment, clothes in different seasons, multiple human poses. For each subject, the labels of gender, race, age, collecting environment and clothes were annotated. The data can be used for human pose recognition and other tasks.
45 PAPERS • 1 BENCHMARK
Dataset of high-resolution (4096×2160), high-fps (1000fps) video frames with extreme motion. X-TEST consists of 15 video clips with 33-length of 4K-1000fps frames. X-TRAIN consists of 4,408 clips from various types of 110 scenes. The clips are 65-length of 1000fps frames
12 PAPERS • 1 BENCHMARK
This is a dataset for video frame interpolation task. The dataset contains the 1920×1080 videos in 240 FPS for videos captured with iPhone 11 and in 120 FPS for gaming content captured with OBS.
9 PAPERS • 1 BENCHMARK
SlowFlow is an optical flow dataset collected by applying Slow Flow technique on data from a high-speed camera and analyzing the performance of the state-of-the-art in optical flow under various levels of motion blur.
7 PAPERS • NO BENCHMARKS YET
ATD-12K is a large-scale animation triplet dataset, which comprises 12,000 triplets(train10k,test2k) by manually inspect and the test2k with rich annotations, including levels of difficulty, the Regions of Interest (RoIs) on movements, and tags on motion categories
6 PAPERS • 1 BENCHMARK
A video dataset for benchmarking upsampling methods. Inter4K contains 1,000 ultra-high resolution videos with 60 frames per second (fps) from online resources. The dataset provides standardized video resolutions at ultra-high definition (UHD/4K), quad-high definition (QHD/2K), full-high definition (FHD/1080p), (standard) high definition (HD/720p), one quarter of full HD (qHD/520p) and one ninth of a full HD (nHD/360p). We use frame rates of 60, 50, 30, 24 and 15 fps for each resolution. Based on this standardization, both super-resolution and frame interpolation tests can be performed for different scaling sizes ($\times 2$, $\times 3$ and $\times 4$). In this paper, we use Inter4K to address frame upsampling and interpolation. Inter4K provides both standardized UHD resolution and 60 fps for all of videos by also containing a diverse set of 1,000 5-second videos. Differences between scenes originate from the equipment (e.g., professional 4K cameras or phones), lighting conditions, vari
3 PAPERS • NO BENCHMARKS YET
To test interpolation performance on various texture types, we developed a new test set, VFITex, which contains twenty 100-frame UHD or HD videos at 24, 30 or 50 FPS, collected from the Xiph, Mitch Martinez Free 4K Stock Footage, UVG database and pexels.com. This dataset covers diverse textured scenes, including crowds, flags, foliage, animals, water, leaves, fire and smoke. HD patches were center-cropped from the UHD sequences, preserving the original UHD characteristics. All frames in each sequence were used for evaluation, totaling 940 quintuplets.
1 PAPER • 1 BENCHMARK