Consistency and Finite Sample Behavior of Binary Class Probability Estimation
In this work we investigate to which extent one can recover class probabilities within the empirical risk minimization (ERM) paradigm. The main aim of our paper is to extend existing results and emphasize the tight relations between empirical risk minimization and class probability estimation. Based on existing literature on excess risk bounds and proper scoring rules, we derive a class probability estimator based on empirical risk minimization. We then derive fairly general conditions under which this estimator will converge, in the L1-norm and in probability, to the true class probabilities. Our main contribution is to present a way to derive finite sample L1-convergence rates of this estimator for different surrogate loss functions. We also study in detail which commonly used loss functions are suitable for this estimation problem and finally discuss the setting of model-misspecification as well as a possible extension to asymmetric loss functions.
PDF Abstract