The Vimeo-90K is a large-scale high-quality video dataset for lower-level video processing. It proposes three different video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution.
195 PAPERS • 3 BENCHMARKS
Virtual KITTI is a photo-realistic synthetic video dataset designed to learn and evaluate computer vision models for several video understanding tasks: object detection and multi-object tracking, scene-level and instance-level semantic segmentation, optical flow, and depth estimation.
120 PAPERS • 1 BENCHMARK
Spring is a large, high-resolution and high-detail, computer-generated benchmark for scene flow, optical flow, and stereo. Based on rendered scenes from the open-source Blender movie "Spring", it provides photo-realistic HD datasets with state-of-the-art visual effects and ground truth training data.
20 PAPERS • 3 BENCHMARKS
The Blackbird unmanned aerial vehicle (UAV) dataset is a large-scale, aggressive indoor flight dataset collected using a custom-built quadrotor platform for use in evaluation of agile perception. The Blackbird dataset contains over 10 hours of flight data from 168 flights over 17 flight trajectories and 5 environments. Each flight includes sensor data from 120Hz stereo and downward-facing photorealistic virtual cameras, 100Hz IMU, motor speed sensors, and 360Hz millimeter-accurate motion capture ground truth. Camera images for each flight were photorealistically rendered using FlightGoggles across a variety of environments to facilitate easy experimentation of high performance perception algorithms.
9 PAPERS • NO BENCHMARKS YET
TUM-GAID (TUM Gait from Audio, Image and Depth) collects 305 subjects performing two walking trajectories in an indoor environment. The first trajectory is traversed from left to right and the second one from right to left. Two recording sessions were performed, one in January, where subjects wore heavy jackets and mostly winter boots, and another one in April, where subjects wore lighter clothes. The action is captured by a Microsoft Kinect sensor which provides a video stream with a resolution of 640×480 pixels and a frame rate around 30 FPS.
The SAMM Long Videos dataset consists of 147 long videos with 343 macro-expressions and 159 micro-expressions. The dataset is FACS-coded with detailed Action Units.
8 PAPERS • 1 BENCHMARK
The TUB CrowdFlow is a synthetic dataset that contains 10 sequences showing 5 scenes. Each scene is rendered twice: with a static point of view and a dynamic camera to simulate drone/UAV based surveillance. The scenes are render using Unreal Engine at HD resolution (1280x720) at 25 fps, which is typical for current commercial CCTV surveillance systems. The total number of frames is 3200.
4 PAPERS • NO BENCHMARKS YET
QUVA Repetition dataset consists of 100 videos displaying a wide variety of repetitive video dynamics, including swimming, stirring, cutting, combing and music-making. All videos have been annotated with individual cycle bounds and a total repetition count.