1 code implementation • 15 Feb 2024 • Matthew J. Holland
In this work, we consider the notion of "criterion collapse," in which optimization of one metric implies optimality in another, with a particular focus on conditions for collapse into error probability minimizers under a wide variety of learning criteria, ranging from DRO and OCE risks (CVaR, tilted ERM) to non-monotonic criteria underlying recent ascent-descent algorithms explored in the literature (Flooding, SoftAD).
1 code implementation • 16 Oct 2023 • Matthew J. Holland, Kosuke Nakatani
As models grow larger and more complex, achieving better off-sample generalization with minimal trial-and-error is critical to the reliability and economy of machine learning workflows.
1 code implementation • 27 Jan 2023 • Matthew J. Holland
Under losses which are potentially heavy-tailed, we consider the task of minimizing sums of the loss mean and standard deviation, without trying to accurately estimate the variance.
1 code implementation • 28 Mar 2022 • Matthew J. Holland
Many novel notions of "risk" (e. g., CVaR, tilted risk, DRO risk) have been proposed and studied, but these risks are all at least as sensitive as the mean to loss tails on the upside, and tend to ignore deviations on the downside.
no code implementations • 11 Oct 2021 • Matthew J. Holland, Kazuki Tanabe
Virtually all machine learning tasks are characterized using some form of loss function, and "good performance" is typically stated in terms of a sufficiently small average loss, taken over the random draw of test data.
1 code implementation • 24 May 2021 • Matthew J. Holland
Under data distributions which may be heavy-tailed, many stochastic gradient-based learning algorithms are driven by feedback queried at points with almost no performance guarantees on their own.
1 code implementation • 11 May 2021 • Matthew J. Holland, El Mehdi Haress
In this work, we consider the setting of learning problems under a wide class of spectral risk (or "L-risk") functions, where a Lipschitz-continuous spectral density is used to flexibly assign weight to extreme loss values.
1 code implementation • 14 Dec 2020 • Matthew J. Holland
We study scalable alternatives to robust gradient descent (RGD) techniques that can be used when the losses and/or gradients can be heavy-tailed, though this will be unknown to the learner.
1 code implementation • 4 Dec 2020 • Matthew J. Holland
In this work, we study a new class of risks defined in terms of the location and deviation of the loss distribution, generalizing far beyond classical mean-variance risk functions.
no code implementations • 9 Jul 2020 • Matthew J. Holland
In this work, we study some novel applications of conformal inference techniques to the problem of providing machine learning procedures with more transparent, accurate, and practical performance guarantees.
no code implementations • 3 Jun 2020 • Matthew J. Holland, El Mehdi Haress
We study learning algorithms that seek to minimize the conditional value-at-risk (CVaR), when all the learner knows is that the losses incurred may be heavy-tailed.
no code implementations • 2 Jun 2020 • Matthew J. Holland
Empirically, we also show that under heavy-tailed losses, the proposed procedure cannot simply be replaced with naive cross-validation.
no code implementations • 1 Jun 2020 • Matthew J. Holland
We study a scalable alternative to robust gradient descent (RGD) techniques that can be used when the gradients can be heavy-tailed, though this will be unknown to the learner.
1 code implementation • 25 Jun 2019 • Matthew J. Holland
We consider the problem of mean estimation assuming only finite variance.
Statistics Theory Statistics Theory
no code implementations • 20 May 2019 • Matthew J. Holland
We derive PAC-Bayesian learning guarantees for heavy-tailed losses, and obtain a novel optimal Gibbs posterior which enjoys finite-sample excess risk bounds at logarithmic confidence.
no code implementations • 15 Oct 2018 • Matthew J. Holland
To improve the off-sample generalization of classical procedures minimizing the empirical risk under potentially heavy-tailed data, new robust learning algorithms have been proposed in recent years, with generalized median-of-means strategies being particularly salient.
1 code implementation • 11 Oct 2018 • Matthew J. Holland
In this work, we study a new approach to optimizing the margin distribution realized by binary classifiers.
no code implementations • 1 Jun 2017 • Matthew J. Holland, Kazushi Ikeda
Minimizing the empirical risk is a popular training strategy, but for learning tasks where the data may be noisy or heavy-tailed, one may require many observations in order to generalize well.