TrackingNet is a large-scale tracking dataset consisting of videos in the wild. It has a total of 30,643 videos split into 30,132 training videos and 511 testing videos, with an average of 470,9 frames.
182 PAPERS • 2 BENCHMARKS
OTB-2015, also referred as Visual Tracker Benchmark, is a visual tracking dataset. It contains 100 commonly used video sequences for evaluating visual tracking. Image Source: http://cvlab.hanyang.ac.kr/tracker_benchmark/datasets.html
174 PAPERS • 1 BENCHMARK
OTB2013 is the previous version of the current OTB2015 Visual Tracker Benchmark. It contains only 50 tracking sequences, as opposed to the 100 sequences in the current version of the benchmark.
110 PAPERS • 2 BENCHMARKS
Source: https://www.vicos.si/Projects/CDTB 4.2 State-of-the-art Comparison A TH CTB (color-and-depth visual object tracking) dataset is recorded by several passive and active RGB-D setups and contains indoor as well as outdoor sequences acquired in direct sunlight. The sequences were recorded to contain significant object pose change, clutter, occlusion, and periods of long-term target absence to enable tracker evaluation under realistic conditions. Sequences are per-frame annotated with 13 visual attributes for detailed analysis. It contains around 100,000 samples. Image Source: https://www.vicos.si/Projects/CDTB
16 PAPERS • NO BENCHMARKS YET
The dataset comprises 25 short sequences showing various objects in challenging backgrounds. Eight sequences are from the VOT2013 challenge (bolt, bicycle, david, diving, gymnastics, hand, sunshade, woman). The new sequences show complementary objects and backgrounds, for example a fish underwater or a surfer riding a big wave. The sequences were chosen from a large pool of sequences using a methodology based on clustering visual features of object and background so that those 25 sequences sample evenly well the existing pool.
12 PAPERS • 1 BENCHMARK
Description The consists of 92 groups of video clips with 113, 918 high resolution frames taken by two drones and 63 groups of video clips with 145, 875 high resolution frames taken by three drones.
4 PAPERS • NO BENCHMARKS YET
ARKitTrack is a new RGB-D tracking dataset for both static and dynamic scenes captured by consumer-grade LiDAR scanners equipped on Apple's iPhone and iPad. ARKitTrack contains 300 RGBD sequences, 455 targets, and 229.7K video frames in total. This dataset has 123.9K pixel-level target masks along with the bounding box annotations and frame-level attributes.
1 PAPER • NO BENCHMARKS YET
This dataset contains nine video sequences captured by a webcam for salient closed boundary tracking evaluation. Each sequence is about 30 sec (30 fps) and the frame size is 640×480 (width×height). There are 9598 frames in total. In each sequence, different motion styles such as translation, rotation and viewpoint changing are all performed.