Search Results for author: Spencer Frei

Found 15 papers, 1 papers with code

Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks

no code implementations NeurIPS 2019 Spencer Frei, Yuan Cao, Quanquan Gu

The skip-connections used in residual networks have become a standard architecture choice in deep learning due to the increased training stability and generalization performance with this architecture, although there has been limited theoretical understanding for this improvement.

Generalization Bounds

Agnostic Learning of a Single Neuron with Gradient Descent

no code implementations NeurIPS 2020 Spencer Frei, Yuan Cao, Quanquan Gu

In the agnostic PAC learning setting, where no assumption on the relationship between the labels $y$ and the input $x$ is made, if the optimal population risk is $\mathsf{OPT}$, we show that gradient descent achieves population risk $O(\mathsf{OPT})+\epsilon$ in polynomial time and sample complexity when $\sigma$ is strictly increasing.

PAC learning

Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins

no code implementations1 Oct 2020 Spencer Frei, Yuan Cao, Quanquan Gu

We analyze the properties of gradient descent on convex surrogates for the zero-one loss for the agnostic learning of linear halfspaces.

General Classification

Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise

1 code implementation4 Jan 2021 Spencer Frei, Yuan Cao, Quanquan Gu

We consider a one-hidden-layer leaky ReLU network of arbitrary width trained by stochastic gradient descent (SGD) following an arbitrary initialization.

Provable Robustness of Adversarial Training for Learning Halfspaces with Noise

no code implementations19 Apr 2021 Difan Zou, Spencer Frei, Quanquan Gu

To the best of our knowledge, this is the first work to show that adversarial training provably yields robust classifiers in the presence of noise.

Classification General Classification +1

Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent

no code implementations NeurIPS 2021 Spencer Frei, Quanquan Gu

We further show that many existing guarantees for neural networks trained by gradient descent can be unified through proxy convexity and proxy PL inequalities.

Self-training Converts Weak Learners to Strong Learners in Mixture Models

no code implementations25 Jun 2021 Spencer Frei, Difan Zou, Zixiang Chen, Quanquan Gu

We show that there exists a universal constant $C_{\mathrm{err}}>0$ such that if a pseudolabeler $\boldsymbol{\beta}_{\mathrm{pl}}$ can achieve classification error at most $C_{\mathrm{err}}$, then for any $\varepsilon>0$, an iterative self-training algorithm initialized at $\boldsymbol{\beta}_0 := \boldsymbol{\beta}_{\mathrm{pl}}$ using pseudolabels $\hat y = \mathrm{sgn}(\langle \boldsymbol{\beta}_t, \mathbf{x}\rangle)$ and using at most $\tilde O(d/\varepsilon^2)$ unlabeled examples suffices to learn the Bayes-optimal classifier up to $\varepsilon$ error, where $d$ is the ambient dimension.

Binary Classification

Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data

no code implementations11 Feb 2022 Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett

Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent.

Random Feature Amplification: Feature Learning and Generalization in Neural Networks

no code implementations15 Feb 2022 Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett

We consider data with binary labels that are generated by an XOR-like function of the input features.

Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data

no code implementations13 Oct 2022 Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro, Wei Hu

In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data.

Vocal Bursts Intensity Prediction

Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization

no code implementations2 Mar 2023 Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro

Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have an implicit bias towards solutions which satisfy the Karush--Kuhn--Tucker (KKT) conditions for margin maximization.

Trained Transformers Learn Linear Models In-Context

no code implementations16 Jun 2023 Ruiqi Zhang, Spencer Frei, Peter L. Bartlett

We show that although gradient flow succeeds at finding a global minimum in this setting, the trained transformer is still brittle under mild covariate shifts.

In-Context Learning regression

The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning

no code implementations6 Aug 2023 Nikhil Ghosh, Spencer Frei, Wooseok Ha, Bin Yu

On the other hand, for any batch size strictly smaller than the number of samples, SGD finds a global minimum which is sparse and nearly orthogonal to its initialization, showing that the randomness of stochastic gradients induces a qualitatively different type of "feature selection" in this setting.

feature selection

Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data

no code implementations4 Oct 2023 Zhiwei Xu, Yutong Wang, Spencer Frei, Gal Vardi, Wei Hu

Second, they can undergo a period of classical, harmful overfitting -- achieving a perfect fit to training data with near-random performance on test data -- before transitioning ("grokking") to near-optimal generalization later in training.

Minimum-Norm Interpolation Under Covariate Shift

no code implementations31 Mar 2024 Neil Mallinar, Austin Zane, Spencer Frei, Bin Yu

We follow our analysis with empirical studies that show these beneficial and malignant covariate shifts for linear interpolators on real image data, and for fully-connected neural networks in settings where the input data dimension is larger than the training sample size.

regression Transfer Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.