We introduce the problem of weakly supervised Multi-Object Tracking and Segmentation, i. e. joint weakly supervised instance segmentation and multi-object tracking, in which we do not provide any kind of mask annotation.
Training MOTSNet with our automatically extracted data leads to significantly improved sMOTSA scores on the novel KITTI MOTS dataset (+1. 9%/+7. 5% on cars/pedestrians), and MOTSNet improves by +4. 1% over previously best methods on the MOTSChallenge dataset.
Additionally, we propose and analyse network distillation as a learning strategy to reduce the computational cost of the deep learning approach at test time.
When neural networks process images which do not resemble the distribution seen during training, so called out-of-distribution images, they often make wrong predictions, and do so too confidently.