Distributional uncertainty exists broadly in many real-world applications, one of which in the form of domain discrepancy.
Some existing methods adopt distribution learning to tackle this issue by exploiting the semantic correlation between age labels.
Ranked #6 on Age Estimation on MORPH album2 (Caucasian)
Deep video models, for example, 3D CNNs or video transformers, have achieved promising performance on sparse video tasks, i. e., predicting one result per video.
Based on the analysis, we present a simple yet efficient framework to address the computational bottlenecks and achieve efficient one-stage VOD by exploiting the temporal consistency in video frames.
State of health (SOH) is a crucial indicator for assessing the degradation level of batteries that cannot be measured directly but requires estimation.
However, we argue that these memory structures are not efficient or sufficient because of two implied operations: (1) concatenating all features in memory for enhancement, leading to a heavy computational cost; (2) frame-wise memory updating, preventing the memory from capturing more temporal information.
In addition, we demonstrate the utilization of visualization result in three ways: (1) We visualize attention with respect to connectionist temporal classification (CTC) loss to train an ASR model with adversarial attention erasing regularization, which effectively decreases the word error rate (WER) of the model and improves its generalization capability.
To address these two challenges, in this paper, we propose an Interpretable Spatial-Temporal Video Transformer (ISTVT), which consists of a novel decomposed spatial-temporal self-attention and a self-subtract mechanism to capture spatial artifacts and temporal inconsistency for robust Deepfake detection.
In this work, we propose a novel LiDAR localization framework, SGLoc, which decouples the pose estimation to point cloud correspondence regression and pose estimation via this correspondence.
Beyond classification, Conv-Adapter can generalize to detection and segmentation tasks with more than 50% reduction of parameters but comparable performance to the traditional full fine-tuning.
In this work, we explore such an impact by theoretically proving that selecting unlabeled data of higher gradient norm leads to a lower upper-bound of test loss, resulting in better test performance.
To address this problem, we propose a new Deformable Patch (DePatch) module which learns to adaptively split the images into patches with different positions and scales in a data-driven way rather than using predefined fixed patches.
Ranked #17 on Semantic Segmentation on DensePASS
Encouraged by the success, we propose a novel One-Shot Path Aggregation Network Architecture Search (OPANAS) algorithm, which significantly improves both searching efficiency and detection accuracy.
IR-Softmax can generalise to any softmax and its variants (which are discriminative for open-set problem) by directly setting the weights as their class centers, naturally solving the data imbalance problem.
Despite recent advances in deep learning-based face frontalization methods, photo-realistic and illumination preserving frontal face synthesis is still challenging due to large pose and illumination discrepancy during training.
To this end, we propose a certainty-based reusable sample selection and correction approach, termed as CRSSC, for coping with label noise in training deep FG models with web images.
In this paper, we propose Differentiable Automatic Data Augmentation (DADA) which dramatically reduces the cost.
Ranked #15 on Data Augmentation on ImageNet
Since deep networks are capable of memorizing the entire dataset, the corrupted samples generated by vanilla MixUp with a badly chosen interpolation policy will degrade the performance of networks.
Despite this, we notice that the semantic ambiguity greatly degrades the detection performance.
Ranked #1 on Face Alignment on 300W (NME_inter-pupil (%, Full) metric)
As for missing pixels on both of half-faces, we present a generative reconstruction subnet together with a perceptual symmetry loss to enforce symmetry consistency of recovered structures.
Ranked #1 on Facial Inpainting on VggFace2
Therefore, we propose a novel sample mining method, called Online Soft Mining (OSM), which assigns one continuous score to each sample to make use of all samples in the mini-batch.
To advance subtle expression recognition, we contribute a Large-scale Subtle Emotions and Mental States in the Wild database (LSEMSW).
To solve this problem, we establish a theoretical equivalence between tensor optimisation and a two-stream gated neural network.
We propose a novel investment decision strategy (IDS) based on deep learning.
The paper presents a dictionary integration algorithm using 3D morphable face models (3DMM) for pose-invariant collaborative-representation-based face classification.
To train such networks, very large training sets are needed with millions of labeled images.
In this paper, we present the Surrey Face Model, a multi-resolution 3D Morphable Model that we make available to the public for non-commercial purposes.
In this work, we conduct an extensive evaluation of CNN-based face recognition systems (CNN-FRS) on a common ground to make our work easily reproducible.
Diabetes is considered a lifestyle disease and a well managed self-care plays an important role in the treatment.