There is a fundamental limitation in the prediction performance that a machine learning model can achieve due to the inevitable uncertainty of the prediction target.
To address this challenge, we propose dictionary-based heterogeneous graph neural network (DHGNet) that effectively handles the heterogeneity of DHG by two-step aggregations, which are word-level and language-level aggregations.
When minimizing the empirical risk in binary classification, it is a common practice to replace the zero-one loss with a surrogate loss to make the learning objective feasible to optimize.
In this paper, we first prove that the focal loss is classification-calibrated, i. e., its minimizer surely yields the Bayes-optimal classifier and thus the use of the focal loss in classification can be theoretically justified.
The goal of classification with rejection is to avoid risky misclassification in error-critical applications such as medical diagnosis and product inspection.
Robust learning from noisy demonstrations is a practical but highly challenging problem in imitation learning.
We study the problem of learning from aggregate observations where supervision signals are given to sets of instances instead of individual instances, while the goal is still to predict labels of unseen individuals.
If the black-box function varies with time, then time-varying Bayesian optimization is a promising framework.
We consider a document classification problem where document labels are absent but only relevant keywords of a target class and unlabeled documents are given.
Although learning from triplet comparison data has been considered in many applications, an important fundamental question of whether we can learn a classifier only from triplet comparison data has remained unanswered.
Appropriately evaluating the discrepancy between domains is essential for the success of unsupervised domain adaptation.
First, we consider an approach based on simultaneous training of a classifier and a rejector, which achieves the state-of-the-art performance in the binary case.
Based on the analysis of the Bayes optimal classifier, we show that given a test class prior, PU classification under class prior shift is equivalent to PU classification with asymmetric error.
A previously proposed discrepancy that does not use the source domain labels requires high computational cost to estimate and may lead to a loose generalization error bound in the target domain.