In this paper, we focus on learning distractor-aware Siamese networks for accurate and long-term tracking. During the off-line training phase, an effective sampling strategy is introduced to control this distribution and make the model focus on the semantic distractors.
Our algorithm pretrains a CNN using a large set of videos with tracking ground-truths to obtain a generic target representation. Online tracking is performed by evaluating the candidate windows randomly sampled around the previous target state.
Discriminant Correlation Filters (DCF) based methods now become a kind of dominant approach to online object tracking. In this work, we present an end-to-end lightweight network architecture, namely DCFNet, to learn the convolutional features and perform the correlation tracking process simultaneously.
These regularities are hard to label for training supervised machine learning algorithms; consequently, algorithms need to learn these regularities from the real world in an unsupervised way. The results suggest a new class of AI algorithms that uniquely combine prediction and scalability in a way that makes them suitable for learning from and --- and eventually acting within --- the real world.
Compared with short-term tracking, the long-term tracking task requires determining the tracked object is present or absent, and then estimating the accurate bounding box if present or conducting image-wide re-detection if absent. Until now, few attempts have been done although this task is much closer to designing practical tracking systems.
By incorporating both temporal and spatial regularization, our STRCF can handle boundary effects without much loss in efficiency and achieve superior performance over SRDCF in terms of accuracy and speed. Compared with SRDCF, STRCF with hand-crafted features provides a 5 times speedup and achieves a gain of 5.4% and 3.6% AUC score on OTB-2015 and Temple-Color, respectively.
Template-matching methods for visual tracking have gained popularity recently due to their comparable performance and fast speed. In this paper, we propose a dynamic memory network to adapt the template to the target's appearance variations during tracking.
Robust object tracking requires knowledge and understanding of the object being tracked: its appearance, its motion, and how it changes over time. A tracker must be able to modify its underlying model and adapt to new observations.
Cross-correlator plays a significant role in many visual perception tasks, such as object detection and tracking. Beyond the linear cross-correlator, this paper proposes a kernel cross-correlator (KCC) that breaks traditional limitations.
Unlike conventional orthogonal decision trees that use a single feature and heuristic measures to obtain a split at each node, we propose to use a more powerful proximal SVM to obtain oblique hyperplanes to capture the geometric structure of the data better. Furthermore, in order to generalize to online tracking scenarios, we derive incremental update steps that enable the hyperplanes in each node to be updated recursively, efficiently and in a closed-form fashion.