Track to Detect and Segment: An Online Multi-Object Tracker

Most online multi-object trackers perform object detection stand-alone in a neural net without any input from tracking. In this paper, we present a new online joint detection and tracking model, TraDeS (TRAck to DEtect and Segment), exploiting tracking clues to assist detection end-to-end. TraDeS infers object tracking offset by a cost volume, which is used to propagate previous object features for improving current object detection and segmentation. Effectiveness and superiority of TraDeS are shown on 4 datasets, including MOT (2D tracking), nuScenes (3D tracking), MOTS and Youtube-VIS (instance segmentation tracking). Project page: https://jialianwu.com/projects/TraDeS.html.

PDF Abstract CVPR 2021 PDF CVPR 2021 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Instance Segmentation Cityscapes test Average Precision 20.2 # 9
Multi-Object Tracking DanceTrack TraDes HOTA 43.3 # 18
DetA 74.5 # 14
AssA 25.4 # 18
MOTA 86.2 # 15
IDF1 41.2 # 18
Online Multi-Object Tracking MOT16 TraDeS MOTA 67.7 # 2
Multi-Object Tracking MOT16 TraDeS MOTA 70.1 # 9
IDF1 64.7 # 8
Multi-Object Tracking MOT17 TraDeS MOTA 69.1 # 20
IDF1 63.9 # 23
Multi-Object Tracking MOTS20 TraDes sMOTSA 50.8 # 5
IDF1 58.7 # 3
Instance Segmentation nuScenes TraDeS MOTA 68.2 # 1
Video Instance Segmentation YouTube-VIS validation TraDeS mask AP 32.6 # 38
AP50 52.6 # 39
AP75 32.8 # 41

Methods