We show how most tracking tasks can be solved within this framework, and that the same appearance model can be used to obtain performance that is competitive against specialised methods for all the five tasks considered.
Ranked #2 on Video Object Segmentation on DAVIS-2017 (mIoU metric)
Video super-resolution, which aims at producing a high-resolution video from its corresponding low-resolution version, has recently drawn increasing attention.
This paper proposes a self-supervised learning method for the person re-identification (re-ID) problem, where existing unsupervised methods usually rely on pseudo labels, such as those from video tracklets or clustering.
In this paper, we propose an MOT system that allows target detection and appearance embedding to be learned in a shared model.
Ranked #9 on Multi-Object Tracking on MOT16 (using extra training data)
The softmax loss and its variants are widely used as objectives for embedding learning, especially in applications like face recognition.
Generic object detection is one of the most fundamental problems in computer vision, yet it is difficult to provide all the bounding-box-level annotations aiming at large-scale object detection for thousands of categories.
Video-based person re-identification has drawn massive attention in recent years due to its extensive applications in video surveillance.
First, we present the modules of spatial attention, channel attention and aligned attention for single-stage object detection.
The visibility awareness allows VPM to extract region-level features and compare two images with focus on their shared regions (which are visible on both images).
The key idea is that we find the local context in the feature space around an instance (face) contains rich information about the linkage relationship between this instance and its neighbors.
Comprehensive experiments demonstrate that our proposed method can handle various blur kenels and achieve state-of-the-art results for small size blurry face images restoration.
In classification adaptation, we transfer a pre-trained network to a multi-label classification task for recognizing the presence of a certain object in an image.
In this paper, we address this problem by progressive domain adaptation with two main steps: classification adaptation and detection adaptation.