9 dataset results for Online Multi-Object Tracking

The Multiple Object Tracking 17 (MOT17) dataset is a dataset for multiple object tracking. Similar to its previous version MOT16, this challenge contains seven different indoor and outdoor scenes of public places with pedestrians as the objects of interest. A video for each scene is divided into two clips, one for training and the other for testing. The dataset provides detections of objects in the video frames with three detectors, namely SDP, Faster-RCNN and DPM. The challenge accepts both on-line and off-line tracking approaches, where the latter are allowed to use the future video frames to predict tracks.

253 PAPERS • 2 BENCHMARKS

MOTChallenge

The MOTChallenge datasets are designed for the task of multiple object tracking. There are several variants of the dataset released each year, such as MOT15, MOT17, MOT20.

176 PAPERS • 8 BENCHMARKS

MOT16 (Multiple Object Tracking 2016)

The MOT16 dataset is a dataset for multiple object tracking. It a collection of existing and new data (part of the sources are from and ), containing 14 challenging real-world videos of both static scenes and moving scenes, 7 for training and 7 for testing. It is a large-scale dataset, composed of totally 110407 bounding boxes in training set and 182326 bounding boxes in test set. All video sequences are annotated under strict standards, their ground-truths are highly accurate, making the evaluation meaningful.

141 PAPERS • 2 BENCHMARKS

MOT15 (Multiple Object Tracking 15)

MOT2015 is a dataset for multiple object tracking. It contains 11 different indoor and outdoor scenes of public places with pedestrians as the objects of interest, where camera motion, camera angle and imaging condition vary greatly. The dataset provides detections generated by the ACF-based detector.

66 PAPERS • 5 BENCHMARKS

MMPTRACK (Multi-camera Multiple People Tracking Dataset)

Multi-camera Multiple People Tracking (MMPTRACK) dataset has about 9.6 hours of videos, with over half a million frame-wise annotations. The dataset is densely annotated, e.g., per-frame bounding boxes and person identities are available, as well as camera calibration parameters. Our dataset is recorded with 15 frames per second (FPS) in five diverse and challenging environment settings., e.g., retail, lobby, industry, cafe, and office. This is by far the largest publicly available multi-camera multiple people tracking dataset.

5 PAPERS • 1 BENCHMARK

BEE23

BEE23 (Multi-bee Tracking Benchmark)

We collected 32 videos that record bee colony activity from different periods on several sunny days. The total size of the dataset is 3,562 frames and 43,169 annotations.

2 PAPERS • NO BENCHMARKS YET

CholecTrack20 (Multi-Perspective Multi-Class Multi-Tool Tracking Dataset)

CholecTrack20 is a surgical video dataset focusing on laparoscopic cholecystectomy and designed for surgical tool tracking, featuring 20 annotated videos. The dataset includes detailed labels for multi-class multi-tool tracking, offering trajectories for tool visibility within the camera scope, intracorporeal movement within the patient's body, and the life-long intraoperative trajectory of each tool. Annotations cover spatial coordinates, tool class, operator identity, phase, visual conditions (occlusion, bleeding, smoke), and more for tools like grasper, bipolar, hook, scissors, clipper, irrigator, and specimen bag, with annotations provided at 1 frame per second across 35K frames and 65K instance tool labels. The dataset uses official splits, allocating 10 videos for training, 2 for validation, and 8 for testing.

1 PAPER • NO BENCHMARKS YET

Oxford Town Center

The Oxford Town Center dataset is a 5-minute video with 7500 frames annotated, which is divided into 6500 for training and 1000 for testing data for pedestrian detection. The data was recorded from a CCTV camera in Oxford for research and development into activity and face recognition.

1 PAPER • 1 BENCHMARK

PersonPath22

PersonPath22 is a large-scale multi-person tracking dataset containing 236 videos captured mostly from static-mounted cameras, collected from sources where we were given the rights to redistribute the content and participants have given explicit consent. Each video has ground-truth annotations including both bounding boxes and tracklet-ids for all the persons in each frame.

1 PAPER • 1 BENCHMARK

Datasets

9 dataset results for Online Multi-Object Tracking