Tracking by Natural Language (TNL2K) is constructed for the evaluation of tracking by natural language specification. TNL2K features:

  • Large-scale: 2,000 sequences, contains 1,244,340 frames, 663 words, 1300 / 700 for the train / testing respectively

  • High-quality: Manual annotation with careful inspection in each frame

  • Multi-modal: Providing visual and language annotation for each sequence

  • Adversarial-samples: Randomly adding adversarial samples for research on adversarial attack and defence

  • Significant-appearance-variation: Containing videos with cloth/face change for pedestrian

  • Heterogeneous: Containing RGB, thermal, Cartoon, Synthetic data

  • Multiple-baseline: Tracking-by-BBox, Tracking-by-Language, Tracking-by-Joint-BBox-Language

Source: Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.


Similar Datasets