1 code implementation • 19 Sep 2022 • Niladri S. Chatterji, Philip M. Long
We bound the excess risk of interpolating deep linear networks trained using gradient flow.
1 code implementation • 26 May 2022 • Niladri S. Chatterji, Saminul Haque, Tatsunori Hashimoto
While a broad range of techniques have been proposed to tackle distribution shift, the simple baseline of training on an $\textit{undersampled}$ balanced dataset often achieves close to state-of-the-art-accuracy across several popular benchmarks.
no code implementations • 15 Feb 2022 • Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett
We consider data with binary labels that are generated by an XOR-like function of the input features.
no code implementations • 11 Feb 2022 • Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett
Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent.
1 code implementation • ICLR 2022 • Ke Alexander Wang, Niladri S. Chatterji, Saminul Haque, Tatsunori Hashimoto
As a remedy, we show that polynomially-tailed losses restore the effects of importance reweighting in correcting distribution shift in overparameterized models.
no code implementations • 6 Oct 2021 • Niladri S. Chatterji, Philip M. Long
We prove a lower bound on the excess risk of sparse interpolating procedures for linear regression with Gaussian data in the overparameterized regime.
no code implementations • 25 Aug 2021 • Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett
The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data.
no code implementations • NeurIPS 2021 • Niladri S. Chatterji, Aldo Pacchiano, Peter L. Bartlett, Michael I. Jordan
We study a theory of reinforcement learning (RL) in which the learner receives binary feedback only once at the end of an episode.
no code implementations • 9 Feb 2021 • Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett
We establish conditions under which gradient descent applied to fixed-width deep networks drives the logistic loss to zero, and prove bounds on the rate of convergence.
no code implementations • 4 Dec 2020 • Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett
We study the training of finite-width two-layer smoothed ReLU networks for binary classification using the logistic loss.
no code implementations • 25 Apr 2020 • Niladri S. Chatterji, Philip M. Long
We prove bounds on the population risk of the maximum margin algorithm for two-class linear classification.
no code implementations • 1 Feb 2020 • Niladri S. Chatterji, Peter L. Bartlett, Philip M. Long
We consider the problem of sampling from a strongly log-concave density in $\mathbb{R}^d$, and prove an information theoretic lower bound on the number of stochastic gradient queries of the log density needed.
no code implementations • ICLR 2020 • Niladri S. Chatterji, Behnam Neyshabur, Hanie Sedghi
We study the phenomenon that some modules of deep neural networks (DNNs) are more critical than others.
no code implementations • 30 May 2019 • Niladri S. Chatterji, Jelena Diakonikolas, Michael. I. Jordan, Peter L. Bartlett
Langevin Monte Carlo (LMC) is an iterative algorithm used to generate samples from a distribution that is known only up to a normalizing constant.
no code implementations • 24 May 2019 • Niladri S. Chatterji, Vidya Muthukumar, Peter L. Bartlett
We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information.
no code implementations • 4 May 2018 • Xiang Cheng, Niladri S. Chatterji, Yasin Abbasi-Yadkori, Peter L. Bartlett, Michael. I. Jordan
We study the problem of sampling from a distribution $p^*(x) \propto \exp\left(-U(x)\right)$, where the function $U$ is $L$-smooth everywhere and $m$-strongly convex outside a ball of radius $R$, but potentially nonconvex inside this ball.
no code implementations • 27 Feb 2018 • Aldo Pacchiano, Niladri S. Chatterji, Peter L. Bartlett
We also study the full information setting when the underlying losses are kernel functions and present an adapted exponential weights algorithm and a conditional gradient descent algorithm.
no code implementations • ICML 2018 • Niladri S. Chatterji, Nicolas Flammarion, Yi-An Ma, Peter L. Bartlett, Michael. I. Jordan
We provide convergence guarantees in Wasserstein distance for a variety of variance-reduction methods: SAGA Langevin diffusion, SVRG Langevin diffusion and control-variate underdamped Langevin diffusion.
no code implementations • NeurIPS 2017 • Niladri S. Chatterji, Peter L. Bartlett
We present theoretical guarantees for an alternating minimization algorithm for the dictionary learning/sparse coding problem.
no code implementations • 12 Jul 2017 • Xiang Cheng, Niladri S. Chatterji, Peter L. Bartlett, Michael. I. Jordan
We study the underdamped Langevin diffusion when the log of the target distribution is smooth and strongly concave.