To address the task, we first analyze the theory of the multi-domain learning, which highlights that 1) mitigating the impact of domain gap and 2) exploiting all samples to train the model can effectively reduce the generalization error in each source domain so as to improve the quality of pseudo-labels.
Specifically, we propose a recurrent motion generator to extract a series of semantic and motion information from the language and feed it along with visual information to a pre-trained StyleGAN to generate high-quality frames.
Partial label learning (PLL) aims to train multi-class classifiers from instances with partial labels (PLs)-a PL for an instance is a set of candidate labels where a fixed but unknown candidate is the true label.
To cope with the challenge, we investigate single-positive multi-label learning (SPMLL) where each example is annotated with only one relevant label, and show that one can successfully learn a theoretically grounded multi-label classifier for the problem.
Besides, for the label distribution of each class, we further revise it to give more and equal attention to the other domains that the class does not belong to, which can effectively reduce the domain gap across different domains and obtain the domain-invariant feature.
Domain adaptation (DA) tries to tackle the scenarios when the test data does not fully follow the same distribution of the training data, and multi-source domain adaptation (MSDA) is very attractive for real world applications.
Most existing PLL approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels and model the generation process of the candidate labels in a simple way.
Subsequently, we design the graph attention transformer layer to transfer this adjacency matrix to adapt to the current domain.
Different from the conventional data augmentation, the proposed domain-aware mix-normalization to enhance the diversity of features during training from the normalization view of the neural network, which can effectively alleviate the model overfitting to the source domains, so as to boost the generalization capability of the model in the unseen domain.
The objective of image manipulation detection is to identify and locate the manipulated regions in the images.
A significance of our work lies in that it shows the potential of unsupervised domain generalization for person ReID and sets a strong baseline for the further research on this topic.
Traditionally, AQA is treated as a regression problem to learn the underlying mappings between videos and action scores.
Ranked #1 on Action Quality Assessment on JIGSAWS
In this paper, we consider instance-dependent PLL and assume that each example is associated with a latent label distribution constituted by the real number of each label, representing the degree to each label describing the feature.
Moreover, the learngene, i. e., the gene for learning initialization rules of the target model, is proposed to inherit the meta-knowledge from the collective model and reconstruct a lightweight individual model on the target task.
Partial-label learning (PLL) utilizes instances with PLs, where a PL includes several candidate labels but only one is the true label (TL).
During the training process, DLT records the loss value of each sample and calculates dynamic loss thresholds.
In this paper, we present a compact learning (CL) framework to embed the features and labels simultaneously and with mutual guidance.
Partial-label learning (PLL) is a multi-class classification problem, where each training example is associated with a set of candidate labels.
The effectiveness of our approach has been demonstrated on both facial age and attractiveness estimation tasks.
Ranked #1 on Age Estimation on ChaLearn 2016
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label.
To achieve the camera alignment, we develop a Multi-Camera Adversarial Learning (MCAL) to map images of different cameras into a shared subspace.
In this paper, a novel Context-and-Spatial Aware Network (CSANet), which integrates both a Context Aware Path and Spatial Aware Path, is proposed to obtain effective features involving both context information and spatial information.
To solve this problem, we assume that each multi-label instance is described by a vector of latent real-valued labels, which can reflect the importance of the corresponding labels.
However, it is difficult to collect sufficient training images with precise labels in some domains such as apparent age estimation, head pose estimation, multi-label classification and semantic segmentation.
Ranked #1 on Head Pose Estimation on BJUT-3D
In order to learn this general model family, this paper uses a method called Logistic Boosting Regression (LogitBoost) which can be seen as an additive weighted function regression from the statistical viewpoint.