no code implementations • 15 Feb 2022 • Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett
We consider data with binary labels that are generated by an XOR-like function of the input features.
no code implementations • 11 Feb 2022 • Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett
Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent.
no code implementations • NeurIPS 2021 • Spencer Frei, Quanquan Gu
We further show that many existing guarantees for neural networks trained by gradient descent can be unified through proxy convexity and proxy PL inequalities.
no code implementations • 25 Jun 2021 • Spencer Frei, Difan Zou, Zixiang Chen, Quanquan Gu
We show that there exists a universal constant $C_{\mathrm{err}}>0$ such that if a pseudolabeler $\boldsymbol{\beta}_{\mathrm{pl}}$ can achieve classification error at most $C_{\mathrm{err}}$, then for any $\varepsilon>0$, an iterative self-training algorithm initialized at $\boldsymbol{\beta}_0 := \boldsymbol{\beta}_{\mathrm{pl}}$ using pseudolabels $\hat y = \mathrm{sgn}(\langle \boldsymbol{\beta}_t, \mathbf{x}\rangle)$ and using at most $\tilde O(d/\varepsilon^2)$ unlabeled examples suffices to learn the Bayes-optimal classifier up to $\varepsilon$ error, where $d$ is the ambient dimension.
no code implementations • 19 Apr 2021 • Difan Zou, Spencer Frei, Quanquan Gu
To the best of our knowledge, this is the first work to show that adversarial training provably yields robust classifiers in the presence of noise.
1 code implementation • 4 Jan 2021 • Spencer Frei, Yuan Cao, Quanquan Gu
We consider a one-hidden-layer leaky ReLU network of arbitrary width trained by stochastic gradient descent (SGD) following an arbitrary initialization.
no code implementations • 1 Oct 2020 • Spencer Frei, Yuan Cao, Quanquan Gu
We analyze the properties of gradient descent on convex surrogates for the zero-one loss for the agnostic learning of linear halfspaces.
no code implementations • NeurIPS 2020 • Spencer Frei, Yuan Cao, Quanquan Gu
In the agnostic PAC learning setting, where no assumption on the relationship between the labels $y$ and the input $x$ is made, if the optimal population risk is $\mathsf{OPT}$, we show that gradient descent achieves population risk $O(\mathsf{OPT})+\epsilon$ in polynomial time and sample complexity when $\sigma$ is strictly increasing.
no code implementations • NeurIPS 2019 • Spencer Frei, Yuan Cao, Quanquan Gu
The skip-connections used in residual networks have become a standard architecture choice in deep learning due to the increased training stability and generalization performance with this architecture, although there has been limited theoretical understanding for this improvement.