Cross-Modality Transformer for Visible-Infrared Person Re-Identification

Visible-infrared person re-identification (VI-ReID) is a challenging task due to the large cross-modality discrepancies and intra-class variations. Existing works mainly focus on learning modality-shared representations by embedding different modalities into the same feature space. However, these methods usually damage the modality-specific information and identification information contained in the features. To alleviate the above issues, we propose a novel Cross-Modality Transformer (CMT) to jointly explore a modality-level alignment module and an instance-level module for VI-ReID. The proposed CMT enjoys several merits. First, the modality-level alignment module is designed to compensate for the missing modality-specific information via a Transformer encoder-decoder architecture. Second, we propose an instance-level alignment module to adaptively adjust the sample features, which is achieved by a query-adaptive feature modulation. To the best of our knowledge, this is the first work to exploit a cross-modality transformer to achieve the modality compensation for VI-ReID. Extensive experimental results on two standard benchmarks demonstrate that our CMT performs favorably against the state-of-the-art methods.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods