|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
Inspired by the human "visual tracking" capability which leverages motion cues to distinguish the target from the background, we propose a Two-Stream Residual Convolutional Network (TS-RCN) for visual tracking, which successfully exploits both appearance and motion features for model update.
Motion models play a great role in visual tracking applications for predicting the possible locations of objects in the next frame.
For high-level visual recognition, self-supervised learning defines and makes use of proxy tasks such as colorization and visual tracking to learn a semantic representation useful for distinguishing objects.
In recent years, visual tracking methods that are based on discriminative correlation filters (DCF) have been very promising.
In recent years, the background-aware correlation filters have achie-ved a lot of research interest in the visual target tracking.
Then, the proposed method extracts deep semantic information from a fully convolutional FEN and fuses it with the best ResNet-based feature maps to strengthen the target representation in the learning process of continuous convolution filters.
Visual tracking is typically solved as a discriminative learning problem that usually requires high-quality samples for online model adaptation.
This strategy efficiently filters out some irrelevant proposals and avoids the redundant computation for feature extraction, which enables our method to operate faster than conventional classification-based tracking methods.
Despite these strong priors, we show that deep trackers often default to tracking by saliency detection - without relying on the object instance representation.
In order to situate this new class of methods in the general picture of the Mean Shift theory, we alo give a synthetic exposure of existing results of this field.