Learning View-Invariant Features for Person Identification in Temporally Synchronized Videos Taken by Wearable Cameras
In this paper, we study the problem of Cross-View Person Identification (CVPI), which aims at identifying the same person from temporally synchronized videos taken by different wearable cameras. Our basic idea is to utilize the human motion consistency for CVPI, where human motion can be computed by optical flow. However, optical flow is view-variant -- the same person's optical flow in different videos can be very different due to view angle change. In this paper, we attempt to utilize 3D human-skeleton sequences to learn a model that can extract view-invariant motion features from optical flows in different views. For this purpose, we use 3D Mocap database to build a synthetic optical flow dataset and train a Triplet Network (TN) consisting of three sub-networks: two for optical flow sequences from different views and one for the underlying 3D Mocap skeleton sequence. Finally, sub-networks for optical flows are used to extract view-invariant features for CVPI. Experimental results show that, using only the motion information, the proposed method can achieve comparable performance with the state-of-the-art methods. Further combination of the proposed method with an appearance-based method achieves new state-of-the-art performance.
PDF Abstract