AUC-Oriented Domain Adaptation: From Theory to Algorithm

The Area Under the ROC curve (AUC) is a crucial metric for machine learning, which is often a reasonable choice for applications like disease prediction and fraud detection where the datasets often exhibit a long-tail nature. However, most of the existing AUC-oriented learning methods assume that the training data and test data are drawn from the same distribution. How to deal with domain shift remains widely open. This paper presents an early trial to attack AUC-oriented Unsupervised Domain Adaptation (UDA) (denoted as AUCUDA hence after). Specifically, we first construct a generalization bound that exploits a new distributional discrepancy for AUC. The critical challenge is that the AUC risk could not be expressed as a sum of independent loss terms, making the standard theoretical technique unavailable. We propose a new result that not only addresses the interdependency issue but also brings a much sharper bound with weaker assumptions about the loss function. Turning theory into practice, the original discrepancy requires complete annotations on the target domain, which is incompatible with UDA. To fix this issue, we propose a pseudo-labeling strategy and present an end-to-end training framework. Finally, empirical studies over five real-world datasets speak to the efficacy of our framework.

PDF

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here