1 code implementation • 1 Dec 2020 • Burak Yildiz, Hayley Hung, Jesse H. Krijthe, Cynthia C. S. Liem, Marco Loog, Gosia Migut, Frans Oliehoek, Annibale Panichella, Przemyslaw Pawelczak, Stjepan Picek, Mathijs de Weerdt, Jan van Gemert
We present ReproducedPapers. org: an open online repository for teaching and structuring machine learning reproducibility.
In their thought-provoking paper , Belkin et al. illustrate and discuss the shape of risk curves in the context of modern high-complexity learners.
Cross-validation under sample selection bias can, in principle, be done by importance-weighting the empirical risk.
In various approaches to learning, notably in domain adaptation, active learning, learning under covariate shift, semi-supervised learning, learning with concept drift, and the like, one often wants to compare a baseline classifier to one or more advanced (or at least different) strategies.
In particular we show the relation between the bound of the state-of-the-art Maximum Mean Discrepancy (MMD) active learner, the bound of the Discrepancy, and a new and looser bound that we refer to as the Nuclear Discrepancy bound.
In this paper, we discuss the approaches we took and trade-offs involved in making a paper on a conceptual topic in pattern recognition research fully reproducible.
For the supervised least squares classifier, when the number of training objects is smaller than the dimensionality of the data, adding more data to the training set may first increase the error rate before decreasing it.
The goal of semi-supervised learning is to improve supervised classifiers by using additional unlabeled training examples.
For semi-supervised techniques to be applied safely in practice we at least want methods to outperform their supervised counterparts.
Experimental results show that also in the general multidimensional case performance improvements can be expected, both in terms of the squared loss that is intrinsic to the classifier, as well as in terms of the expected classification error.
Our empirical evaluation of FLDA focuses on problems comprising binary and count data in which the transfer can be naturally modeled via a dropout distribution, which allows the classifier to adapt to differences in the marginal probability of features in the source and the target domain.
Using any one of these methods is not guaranteed to outperform the supervised classifier which does not take the additional unlabeled data into account.