However, training two networks with a set of noisy pseudo labels reduces the complementarity of the two networks and results in label noise accumulation.
Recent deep learning based video synthesis approaches, in particular with applications that can forge identities such as "DeepFake", have raised great security concerns.
Assessing action quality from videos has attracted growing attention in recent years.
Ranked #4 on Action Quality Assessment on AQA-7
Person Re-IDentification (P-RID), as an instance-level recognition problem, still remains challenging in computer vision community.
Current Convolutional Neural Network (CNN)-based object detection models adopt strictly feedforward inference to predict the final detection results.
Multi-shot person re-identification (MsP-RID) utilizes multiple images from the same person to facilitate identification.
In contrast to these methods, this paper advocates a different paradigm: part of the learning can be performed online but with nominal costs, so as to achieve online metric adaptation for different input probes.
A common treatment is to use the same local reconstruction in the two spaces, i. e., the reconstruction weights in the appearance space are transferred to the gaze space for gaze reconstruction.