Object Tracking Benchmark (OTB) is a visual tracking benchmark that is widely used to evaluate the performance of a visual tracking algorithm. The dataset contains a total of 100 sequences and each is annotated frame-by-frame with bounding boxes and 11 challenge attributes. OTB-2013 dataset contains 51 sequences and the OTB-2015 dataset contains all 100 sequences of the OTB dataset.
320 PAPERS • 4 BENCHMARKS
OTB-2015, also referred as Visual Tracker Benchmark, is a visual tracking dataset. It contains 100 commonly used video sequences for evaluating visual tracking. Image Source: http://cvlab.hanyang.ac.kr/tracker_benchmark/datasets.html
157 PAPERS • 1 BENCHMARK
LaSOT is a high-quality benchmark for Large-scale Single Object Tracking. LaSOT consists of 1,400 sequences with more than 3.5M frames in total. Each frame in these sequences is carefully and manually annotated with a bounding box, making LaSOT one of the largest densely annotated tracking benchmark. The average video length of LaSOT is more than 2,500 frames, and each sequence comprises various challenges deriving from the wild where target objects may disappear and re-appear again in the view.
127 PAPERS • 1 BENCHMARK
OTB2013 is the previous version of the current OTB2015 Visual Tracker Benchmark. It contains only 50 tracking sequences, as opposed to the 100 sequences in the current version of the benchmark.
106 PAPERS • 2 BENCHMARKS
VOT2018 is a dataset for visual object tracking. It consists of 60 challenging videos collected from real-life datasets.
106 PAPERS • 1 BENCHMARK
The GOT-10k dataset contains more than 10,000 video segments of real-world moving objects and over 1.5 million manually labelled bounding boxes. The dataset contains more than 560 classes of real-world moving objects and 80+ classes of motion patterns.
104 PAPERS • 1 BENCHMARK
VOT2016 is a video dataset for visual object tracking. It contains 60 video clips and 21,646 corresponding ground truth maps with pixel-wise annotation of salient objects.
102 PAPERS • 1 BENCHMARK
TrackingNet is a large-scale tracking dataset consisting of videos in the wild. It has a total of 30,643 videos split into 30,132 training videos and 511 testing videos, with an average of 470,9 frames.
96 PAPERS • 1 BENCHMARK
VOT2017 is a Visual Object Tracking dataset for different tasks that contains 60 short sequences annotated with 6 different attributes.
52 PAPERS • 2 BENCHMARKS
Youtube-VOS is a Video Object Segmentation dataset that contains 4,453 videos - 3,471 for training, 474 for validation, and 508 for testing. The training and validation videos have pixel-level ground truth annotations for every 5th frame (6 fps). It also contains Instance Segmentation annotations. It has more than 7,800 unique objects, 190k high-quality manual annotations and more than 340 minutes in duration.
26 PAPERS • 4 BENCHMARKS
A new long video dataset and benchmark for single object tracking. The dataset consists of 50 HD videos from real world scenarios, encompassing a duration of over 400 minutes (676K frames), making it more than 20 folds larger in average duration per sequence and more than 8 folds larger in terms of total covered duration, as compared to existing generic datasets for visual tracking.
11 PAPERS • NO BENCHMARKS YET
Source: https://www.vicos.si/Projects/CDTB 4.2 State-of-the-art Comparison A TH CTB (color-and-depth visual object tracking) dataset is recorded by several passive and active RGB-D setups and contains indoor as well as outdoor sequences acquired in direct sunlight. The sequences were recorded to contain significant object pose change, clutter, occlusion, and periods of long-term target absence to enable tracker evaluation under realistic conditions. Sequences are per-frame annotated with 13 visual attributes for detailed analysis. It contains around 100,000 samples. Image Source: https://www.vicos.si/Projects/CDTB
7 PAPERS • NO BENCHMARKS YET
VOT2019 is a Visual Object Tracking benchmark for short-term tracking in RGB.
3 PAPERS • 1 BENCHMARK
TREK-150 is a benchmark dataset for object tracking in First Person Vision (FPV) videos composed of 150 densely annotated video sequences.
2 PAPERS • NO BENCHMARKS YET