Knowledge distillation is an effective tool to compress large pre-trained Convolutional Neural Networks (CNNs) or their ensembles into models applicable to mobile and embedded devices.
On the other hand, mutual instance selection further selects reliable and informative instances for training according to the peer-confidence and relationship disagreement of the networks.
To address this, in this work, an automatic approach is proposed to directly clone the whole outfits from real-world person images to virtual 3D characters, such that any virtual person thus created will appear very similar to its real-world counterpart.
As for the data, we show that the autonomous driving benchmarks are monotonous in nature, that is, they are not diverse in scenarios and dense in pedestrians.
In the meantime, we design an adaptive BN layer in the domain-invariant stream, to approximate the statistics of various unseen domains.
Third, by investigating the advantages of both anchor-based and anchor-free models, we further augment AlignPS with an ROI-Align head, which significantly improves the robustness of re-id features while still keeping our model highly efficient.
Ranked #1 on Person Search on PRW
While single-view 3D reconstruction has made significant progress benefiting from deep shape representations in recent years, garment reconstruction is still not solved well due to open surfaces, diverse topologies and complex geometric details.
This paper inventively considers weakly supervised person search with only bounding box annotations.
Last but not least, we introduce a region feature alignment and an instance discriminator to learn domain-invariant features for object proposals.
In this work, we further investigate the possibility of applying Transformers for image matching and metric learning given pairs of images.
To this end, we first propose a baseline model equipped with one transformer decoder as detection head.
Though online hard example mining has improved the learning efficiency to some extent, the mining in mini batches after random sampling is still limited.
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images, which can be regarded as the unified task of pedestrian detection and person re-identification (re-id).
Ranked #6 on Person Search on CUHK-SYSU
This paper proposes a human parsing based texture transfer model via cross-view consistency learning to generate the texture of 3D human body from a single image.
In this way, human annotations are no longer required, and it is scalable to large and diverse real-world datasets.
To address this, we propose to automatically synthesize a large-scale person re-identification dataset following a set-up similar to real surveillance but with virtual environments, and then use the synthesized person images to train a generalizable person re-identification model.
This paper proposes a novel light-weight module, the Attentive WaveBlock (AWB), which can be integrated into the dual networks of mutual learning to enhance the complementarity and further depress noise in the pseudo-labels.
Ranked #2 on Unsupervised Domain Adaptation on Market to MSMT
Furthermore, we illustrate that diverse and dense datasets, collected by crawling the web, serve to be an efficient source of pre-training for pedestrian detection.
Ranked #1 on Pedestrian Detection on CityPersons (using extra training data)
The proposed model is equipped with a novel detection head based on heatmap regression, which conducts score and offset predictions simultaneously on low-resolution feature maps.
In this paper, beyond representation learning, we consider how to formulate person image matching directly in deep feature maps.
Like edges, corners, blobs and other feature detectors, the proposed detector scans for feature points all over the image, for which the convolution is naturally suited.
Ranked #6 on Pedestrian Detection on Caltech (using extra training data)
To address this problem, we propose a pixel-aware deep function-mixture network for SSR, which is composed of a new class of modules, termed function-mixture (FM) blocks.
Specifically, the same basic deep learning architecture is a shortly and densely connected convolutional neural network to extract basic feature maps of an input square vehicle image in the first stage.
Ranked #3 on Vehicle Re-Identification on VehicleID Small (mAP metric)
However, current single-stage detectors (e. g. SSD) have not presented competitive accuracy on common pedestrian detection benchmarks.
Ranked #9 on Pedestrian Detection on Caltech (using extra training data)
Then, a robust image representation based on color names is obtained by concatenating the statistical descriptors in each stripe.
However, to achieve this, existing deep models prefer to adopt image pairs or triplets to form verification loss, which is inefficient and unstable since the number of training pairs or triplets grows rapidly as the number of training data grows.
In this paper, a deep hybrid similarity learning (DHSL) method for person Re-ID based on a convolution neural network (CNN) is proposed.
From this point of view, selecting suitable positive i. e. intra-class) training samples within a local range is critical for training the CNN embedding, especially when the data has large intra-class variations.
We address a new partial person re-identification (re-id) problem, where only a partial observation of a person is available for matching across different non-overlapping camera views.
We argue that the PSD constraint provides a useful regularization to smooth the solution of the metric, and hence the learned metric is more robust than without the PSD constraint.
In this paper, we propose a novel CNN-based method to learn a discriminative metric with good robustness to the over-fitting problem in person re-identification.
A new approach to the problem has been raised which intends to match features of different modalities directly.
First, a new image feature called Normalized Pixel Difference (NPD) is proposed.
Ranked #7 on Face Detection on PASCAL Face
Person re-identification is becoming a hot research for developing both machine learning algorithms and video surveillance applications.
In this paper, we propose an effective feature representation called Local Maximal Occurrence (LOMO), and a subspace and metric learning method called Cross-view Quadratic Discriminant Analysis (XQDA).
Ranked #80 on Person Re-Identification on DukeMTMC-reID
For NIR-VIS problem, we produce new state-of-the-art performance on the CASIA HFB and NIR-VIS 2. 0 databases.
The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background.