Although deep learning techniques have largely improved face recognition, unconstrained surveillance face recognition (FR) is still an unsolved challenge, due to the limited training data and the gap of domain distribution.
We find that FER models remember noisy samples by focusing on a part of the features that can be considered related to the noisy labels instead of learning from the whole features that lead to the latent truth.
Ranked #1 on Facial Expression Recognition on FERPlus
Convolutional neural network based face forgery detection methods have achieved remarkable results during training, but struggled to maintain comparable performance during testing.
To solve this problem, we propose a pose augmentation solution via DH forward kinematics model, which we call DH-AUG. We observe that the previous work is all based on single-frame pose augmentation, if it is directly applied to video pose estimator, there will be several previously ignored problems: (i) angle ambiguity in bone rotation (multiple solutions); (ii) the generated skeleton video lacks movement continuity.
Motivated by this key observation, we propose a framework for face forgery detection and categorization consisting of: 1) a Spatial-Temporal Filtering Network (STFNet) for PPG signals filtering, and 2) a Spatial-Temporal Interaction Network (STINet) for constraint and interaction of PPG signals.
Despite great progress in face recognition tasks achieved by deep convolution neural networks (CNNs), these models often face challenges in real world tasks where training images gathered from Internet are different from test images because of different lighting condition, pose and image quality.
The cycle label-consistent loss reinforces the consistency between ground-truth labels and pseudo-labels of source samples leading to statistically similar latent representations between source and target domains.
In this paper, we investigate the face privacy protection from a technology standpoint based on a new type of customized cloak, which can be applied to all the images of a regular user, to prevent malicious face recognition systems from uncovering their identity.
Deep face recognition has achieved great success due to large-scale training databases and rapidly developing loss functions.
Ranked #1 on Face Verification on LFW (Accuracy metric)
We introduce the Oracle-MNIST dataset, comprising of 28$\times $28 grayscale images of 30, 222 ancient characters from 10 categories, for benchmarking pattern classification, with particular challenges on image noise and distortion.
Second, transformation is achieved via swapping the learned textures across domains and a classifier for final classification is trained to predict the labels of the transformed scanned characters.
Finally, to mitigate the algorithmic bias, we propose a novel meta-learning algorithm, called Meta Balanced Network (MBN), which learns adaptive margins in large margin loss such that the model optimized by this loss can perform fairly across people with different skin tones.
A novel Shuffled Style Assembly Network (SSAN) is proposed to extract and reassemble different content and style features for a stylized feature space.
Video Question Answering (VideoQA) aims to answer natural language questions according to the given videos.
On this basis, we reveal a "noisy distillation" problem stemming from the noise in dreaming memory, and further propose to augment distillation in a pairwise and cross-wise pattern over different views of memory to mitigate it.
With increasing appealing to privacy issues in face recognition, federated learning has emerged as one of the most prevalent approaches to study the unconstrained face recognition problem with private decentralized data.
To quantify these uncertainties and achieve good performance under noisy data, we regard uncertainty as a relative concept and propose an innovative uncertainty learning method called Relative Uncertainty Learning (RUL).
Ranked #2 on Facial Expression Recognition on RAF-DB
As more and more people begin to wear masks due to current COVID-19 pandemic, existing face recognition systems may encounter severe performance degradation when recognizing masked faces.
Ranked #1 on Face Recognition on MLFW
Although vanilla Convolutional Neural Network (CNN) based detectors can achieve satisfactory performance on fake face detection, we observe that the detectors tend to seek forgeries on a limited region of face, which reveals that the detectors is short of understanding of forgery.
Face photo-sketch synthesis and recognition has many applications in digital entertainment and law enforcement.
The training of a deep face recognition system usually faces the interference of label noise in the training data.
Inspired by the effectiveness of pseudo-labels in domain adaptation, we propose a reinforcement learning based selective pseudo-labeling method for semi-supervised domain adaptation.
In this paper, we propose a simple yet effective approach, named Point Adversarial Self Mining (PASM), to improve the recognition accuracy in facial expression recognition.
Much of the work on automatic facial expression recognition relies on databases containing a certain number of emotion classes and their exaggerated facial configurations (generally six prototypical facial expressions), based on Ekman's Basic Emotion Theory.
However, the performance of the current state-of-the-art facial expression recognition (FER) approaches is directly related to the labeled data for training.
In particular, the existence of transferable adversarial examples can severely hinder the robustness of DCNNs since this type of attacks can be applied in a fully black-box manner without queries on the target system.
To encourage fairness, we introduce the idea of adaptive margin to learn balanced performance for different races based on large margin losses.
In this paper, we train a validation classifier to normalize the decision threshold, which means that the result can be obtained directly without replacing the threshold.
The Deep neural networks (DNNs) have achieved great success on a variety of computer vision tasks, however, they are highly vulnerable to adversarial attacks.
Then, rethinking person ReID as a zero-shot learning problem, we propose the Mixed High-Order Attention Network (MHN) to further enhance the discrimination and richness of attention knowledge in an explicit manner.
Ranked #4 on Person Re-Identification on CUHK03-C
In zero-shot image retrieval (ZSIR) task, embedding learning becomes more attractive, however, many methods follow the traditional metric learning idea and omit the problems behind zero-shot settings.
Datasets play an important role in the progress of facial expression recognition algorithms, but they may suffer from obvious biases caused by different cultures and collection conditions.
In this paper, different from the approaches on learning the loss structures, we propose a robust SNR distance metric based on Signal-to-Noise Ratio (SNR) for measuring the similarity of image pairs for deep metric learning.
However, in this paper, we first emphasize that the generalization ability is a core ingredient of this 'good' embedding as well and largely affects the metric performance in zero-shot settings as a matter of fact.
Racial bias is an important issue in biometric, but has not been thoroughly studied in deep face recognition.
Recently, learning discriminative features to improve the recognition performances gradually becomes the primary goal of deep learning, and numerous remarkable works have emerged.
We assume that problems inside are inadequate use of supervision information and imbalance between semantics and details at all level feature maps in CNN even with Feature Pyramid Networks (FPN).
Deep embedding learning becomes more attractive for discriminative feature learning, but many methods still require hard-class mining, which is computationally complex and performance-sensitive.
We then introduce the available datasets that are widely used in the literature and provide accepted data selection and evaluation principles for these datasets.
Labeled Faces in the Wild (LFW) database has been widely utilized as the benchmark of unconstrained face verification and due to big data driven machine learning methods, the performance on the database approaches nearly 100%.
In this paper, we first emphasize that the early saturation behavior of softmax will impede the exploration of SGD, which sometimes is a reason for model converging at a bad local-minima, then propose Noisy Softmax to mitigating this early saturation issue by injecting annealed noise in softmax during each iteration.
Past research on facial expressions have used relatively limited datasets, which makes it unclear whether current methods can be employed in real world.
In this paper, our generative model trained with synthetic images rendered from 3D models reduces the workload of data collection and limitation of conditions.
In this paper, we propose a multi-manifold deep metric learning (MMDML) method for image set classification, which aims to recognize an object of interest from a set of image instances captured from varying viewpoints or under varying illuminations.
We extend the classical linear discriminant analysis (LDA) technique to linear ranking analysis (LRA), by considering the ranking order of classes centroids on the projected subspace.
The success of sparse representation based classification (SRC) has largely boosted the research of sparsity based face recognition in recent years.