Search Results for author: Niladri S. Chatterji

Found 20 papers, 3 papers with code

Underdamped Langevin MCMC: A non-asymptotic analysis

no code implementations12 Jul 2017 Xiang Cheng, Niladri S. Chatterji, Peter L. Bartlett, Michael. I. Jordan

We study the underdamped Langevin diffusion when the log of the target distribution is smooth and strongly concave.

Alternating minimization for dictionary learning: Local Convergence Guarantees

no code implementations NeurIPS 2017 Niladri S. Chatterji, Peter L. Bartlett

We present theoretical guarantees for an alternating minimization algorithm for the dictionary learning/sparse coding problem.

Dictionary Learning

On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo

no code implementations ICML 2018 Niladri S. Chatterji, Nicolas Flammarion, Yi-An Ma, Peter L. Bartlett, Michael. I. Jordan

We provide convergence guarantees in Wasserstein distance for a variety of variance-reduction methods: SAGA Langevin diffusion, SVRG Langevin diffusion and control-variate underdamped Langevin diffusion.

Online learning with kernel losses

no code implementations27 Feb 2018 Aldo Pacchiano, Niladri S. Chatterji, Peter L. Bartlett

We also study the full information setting when the underlying losses are kernel functions and present an adapted exponential weights algorithm and a conditional gradient descent algorithm.

Sharp convergence rates for Langevin dynamics in the nonconvex setting

no code implementations4 May 2018 Xiang Cheng, Niladri S. Chatterji, Yasin Abbasi-Yadkori, Peter L. Bartlett, Michael. I. Jordan

We study the problem of sampling from a distribution $p^*(x) \propto \exp\left(-U(x)\right)$, where the function $U$ is $L$-smooth everywhere and $m$-strongly convex outside a ball of radius $R$, but potentially nonconvex inside this ball.

OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits

no code implementations24 May 2019 Niladri S. Chatterji, Vidya Muthukumar, Peter L. Bartlett

We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information.

Multi-Armed Bandits

Langevin Monte Carlo without smoothness

no code implementations30 May 2019 Niladri S. Chatterji, Jelena Diakonikolas, Michael. I. Jordan, Peter L. Bartlett

Langevin Monte Carlo (LMC) is an iterative algorithm used to generate samples from a distribution that is known only up to a normalizing constant.

Oracle Lower Bounds for Stochastic Gradient Sampling Algorithms

no code implementations1 Feb 2020 Niladri S. Chatterji, Peter L. Bartlett, Philip M. Long

We consider the problem of sampling from a strongly log-concave density in $\mathbb{R}^d$, and prove an information theoretic lower bound on the number of stochastic gradient queries of the log density needed.

Finite-sample Analysis of Interpolating Linear Classifiers in the Overparameterized Regime

no code implementations25 Apr 2020 Niladri S. Chatterji, Philip M. Long

We prove bounds on the population risk of the maximum margin algorithm for two-class linear classification.

When does gradient descent with logistic loss find interpolating two-layer networks?

no code implementations4 Dec 2020 Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett

We study the training of finite-width two-layer smoothed ReLU networks for binary classification using the logistic loss.

Binary Classification

When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations?

no code implementations9 Feb 2021 Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett

We establish conditions under which gradient descent applied to fixed-width deep networks drives the logistic loss to zero, and prove bounds on the rate of convergence.

The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks

no code implementations25 Aug 2021 Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett

The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data.

Foolish Crowds Support Benign Overfitting

no code implementations6 Oct 2021 Niladri S. Chatterji, Philip M. Long

We prove a lower bound on the excess risk of sparse interpolating procedures for linear regression with Gaussian data in the overparameterized regime.

regression

Is Importance Weighting Incompatible with Interpolating Classifiers?

1 code implementation ICLR 2022 Ke Alexander Wang, Niladri S. Chatterji, Saminul Haque, Tatsunori Hashimoto

As a remedy, we show that polynomially-tailed losses restore the effects of importance reweighting in correcting distribution shift in overparameterized models.

Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data

no code implementations11 Feb 2022 Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett

Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent.

Random Feature Amplification: Feature Learning and Generalization in Neural Networks

no code implementations15 Feb 2022 Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett

We consider data with binary labels that are generated by an XOR-like function of the input features.

Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification

1 code implementation26 May 2022 Niladri S. Chatterji, Saminul Haque, Tatsunori Hashimoto

While a broad range of techniques have been proposed to tackle distribution shift, the simple baseline of training on an $\textit{undersampled}$ balanced dataset often achieves close to state-of-the-art-accuracy across several popular benchmarks.

Binary Classification

Deep Linear Networks can Benignly Overfit when Shallow Ones Do

1 code implementation19 Sep 2022 Niladri S. Chatterji, Philip M. Long

We bound the excess risk of interpolating deep linear networks trained using gradient flow.

Cannot find the paper you are looking for? You can Submit a new open access paper.