An Improved Deep Learning Architecture for Person Re-Identification
In this work we propose a method for simultaneously learning features and a corresponding similarity metric for person re-identification. We present a deep convolution architecture with layers specially designed to address the problem of re-identification. Given a pair of images as input, our network outputs a similarity value indicating whether the two input images depict the same person. Novel elements of our architecture include a layer that computes cross-input neighborhood differences, which capture local relationships among mid-level features that were computed separately from the two input images. A high-level summary of the outputs of this layer is computed by a layer of patch summary features, which are then spatially integrated in subsequent layers. Our method significantly outperforms the state of the art on both a large data set (CUHK03) and a medium-sized dataset (CUHK01), and it is resistant to overfitting. We also demonstrate that by initially training on an unrelated large data set before fine tuning on a small target data set, our network can achieve results comparable to the state of the art even on the small data set (VIPeR).
PDF Abstract