Based on deep networks, video object detection is actively studied for pushing the limits of detection speed and accuracy.
The learning problem of the sample generation (i. e., diversity transfer) is solved via minimizing an effective meta-classification loss in a single-stage network, instead of the generative loss in previous works.
To reciprocate these two tasks, we design a two-stream structure to learn features on both the object level (i. e., bounding boxes) and the pixel level (i. e., instance masks) jointly.
Ranked #54 on Instance Segmentation on COCO test-dev
In this paper, we propose an end-to-end framework to learn the convolutional features and perform the tracking process simultaneously, namely, a unified convolutional tracker (UCT).
Without any bells and whistles, our method obtains 80. 3\% mAP on the ImageNet VID dataset, which is superior over the previous state-of-the-arts.
Video object segmentation (VOS) aims at pixel-level object tracking given only the annotations in the first frame.
Ranked #1 on Visual Object Tracking on YouTube-VOS 2018 (Jaccard (Seen) metric)
In this paper, we study this problem and propose Mask Scoring R-CNN which contains a network block to learn the quality of the predicted instance masks.
Ranked #39 on Instance Segmentation on COCO test-dev
In our implementations, architectures are first searched on a small dataset, e. g., CIFAR-10.
We propose a novel deep network called Mancs that solves the person re-identification problem from the following aspects: fully utilizing the attention mechanism for the person misalignment problem and properly sampling for the ranking loss to obtain more stable person representation.
The separation of the task requires to define a hand-crafted training goal in affinity learning stage and a hand-crafted cost function of data association stage, which prevents the tracking goals from learning directly from the feature.
To address this issue, we propose the Reinforced Evolutionary Neural Architecture Search (RE- NAS), which is an evolutionary method with the reinforced mutation for NAS.
We study the problem of unsupervised domain adaptive re-identification (re-ID) which is an active topic in computer vision but lacks a theoretical foundation.
Ranked #12 on Unsupervised Domain Adaptation on Market to Duke
Tactical driving decision making is crucial for autonomous driving systems and has attracted considerable interest in recent years.
Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks.
With character candidates detected by cascade boosting, the min-cost flow network model integrates the last three sequential steps into a single process which solves the error accumulation problem at both character level and text line level effectively.
While deep convolutional neural networks (CNNs) have shown a great success in single-label image classification, it is important to note that real world images generally contain multiple labels, which could correspond to different objects, scenes, actions and attributes in an image.
While feedforward deep convolutional neural networks (CNNs) have been a great success in computer vision, it is important to remember that the human visual contex contains generally more feedback connections than foward connections.
The recent development in learning deep representations has demonstrated its wide applications in traditional vision tasks like classification and detection.
To demonstrate the effectiveness of our approach, we collect a large-scale real-world clothing classification dataset with both noisy and clean labels.
Pixel-level labelling tasks, such as semantic segmentation, play a central role in image understanding.
Ranked #31 on Real-Time Semantic Segmentation on Cityscapes test
Learning Mahanalobis distance metrics in a high- dimensional feature space is very difficult especially when structural sparsity and low rank are enforced to improve com- putational efficiency in testing phase.