A Simple and Robust Correlation Filtering Method for Text-based Person Search

Text-based person search aims to associate pedestrian images with natural language descriptions. In this task, extracting differentiated representations and aligning them among identities and descriptions is an essential yet challenging problem. Most of the previous methods depend on additional language parsers or vision techniques to select the relevant regions or words from noise inputs. But there exists heavy computation cost and inevitable error accumulation. Meanwhile, simply using horizontal segmentation images to obtain local-level features would harm the reliability of models as well. In this paper, we present a novel end-to-end Simple and Robust Correlation Filtering (SRCF) method which can effectively extract key clues and adaptively align the discriminative features. Different from previous works, our framework focuses on computing the similarity between templates and inputs. In particular, we design two different types of filtering modules (i.e., denoising filters and dictionary filters) to extract crucial features and establish multi-modal mappings. Extensive experiments have shown that our method improves the robustness of the model and achieves better performance on the two text-based person search datasets. Source code is available at https://github.com/Suo-Wei/SRCF.

Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Text based Person Retrieval CUHK-PEDES SRCF R@1 64.88 # 6
R@10 88.56 # 6
R@5 83.02 # 5
Text based Person Retrieval ICFG-PEDES SRCF R@1 57.18 # 4