Context-Aware Feature Learning for Noise Robust Person Search
Person search aims to localize and identify specific pedestrians from numerous surveillance scene images. In this work, we focus on the noise in person search. We categorize the noise into scene-inherent noise and human-introduced noise. Scene-inherent noise comes from congestion, occlusion, and illumination changes. Human-introduced noise originates from the labeling process. For scene-inherent noise, we propose a novel context contrastive loss to take advantage of the latent contextual information from scene images. Features from context regions are utilized to construct contrastive pairs to constrain the feature discrimination among pedestrians in scene images while maintaining the feature consistency of the same identity. The network can thus learn to distinguish congested and overlapped pedestrians and more robust features can be obtained. For human-introduced noise, we propose a noise-discovery and noise-suppression training process for mislabeling robust person search. After the first training pass, the relation between feature prototypes of different identities is analyzed and the mislabeled pedestrians are discovered. During the second training pass, the label noise is suppressed to reduce the negative influence of mislabeled data. Experiments show that the proposed context-aware noise-robust (CANR) person search can achieve competitive performance. Further ablation studies confirm the effectiveness of CANR.
PDF