124 papers with code • 3 benchmarks • 16 datasets
Visual Tracking is an essential and actively researched problem in the field of computer vision with various real-world applications such as robotic services, smart surveillance systems, autonomous driving, and human-computer interaction. It refers to the automatic estimation of the trajectory of an arbitrary target object, usually specified by a bounding box in the first frame, as it moves around in subsequent video frames.
Robust object tracking requires knowledge and understanding of the object being tracked: its appearance, its motion, and how it changes over time.
Moreover, we propose a new model architecture to perform depth-wise and layer-wise aggregations, which not only further improves the accuracy but also reduces the model size.
In this work, we present an end-to-end lightweight network architecture, namely DCFNet, to learn the convolutional features and perform the correlation tracking process simultaneously.
It combines a Convolutional Neural Network (CNN) backbone and a cross-correlation operator, and takes advantage of the features from exemplary images for more accurate object tracking.
Visual object tracking has been a fundamental topic in recent years and many deep learning based trackers have achieved state-of-the-art performance on multiple benchmarks.
Compared with short-term tracking, the long-term tracking task requires determining the tracked object is present or absent, and then estimating the accurate bounding box if present or conducting image-wide re-detection if absent.
In this paper we illustrate how to perform both visual object tracking and semi-supervised video object segmentation, in real-time, with a single simple approach.