Search Results for author: Peter L. Bartlett

Found 82 papers, 4 papers with code

RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning

18 code implementations • 9 Nov 2016 • Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel

The activations of the RNN store the state of the "fast" RL algorithm on the current (previously unseen) MDP.

reinforcement-learning Reinforcement Learning (RL)

794

Paper
Code

Sharpness-Aware Minimization and the Edge of Stability

1 code implementation • 21 Sep 2023 • Philip M. Long, Peter L. Bartlett

Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size $\eta$, the operator norm of the Hessian of the loss grows until it approximately reaches $2/\eta$, after which it fluctuates around this value.

Paper
Code

Greedy Convex Ensemble

1 code implementation • 9 Oct 2019 • Tan Nguyen, Nan Ye, Peter L. Bartlett

Theoretically, we first consider whether we can use linear, instead of convex, combinations, and obtain generalization results similar to existing ones for learning from a convex hull.

Paper
Code

Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks

no code implementations • ICML 2018 • Peter L. Bartlett, David P. Helmbold, Philip M. Long

We provide polynomial bounds on the number of iterations for gradient descent to approximate the least squares matrix $\Phi$, in the case where the initial hypothesis $\Theta_1 = ... = \Theta_L = I$ has excess loss bounded by a small enough constant.

Paper
Add Code

Best of many worlds: Robust model selection for online supervised learning

no code implementations • 22 May 2018 • Vidya Muthukumar, Mitas Ray, Anant Sahai, Peter L. Bartlett

We introduce algorithms for online, full-information prediction that are competitive with contextual tree experts of unknown complexity, in both probabilistic and adversarial settings.

Model Selection

Paper
Add Code

Sharp convergence rates for Langevin dynamics in the nonconvex setting

no code implementations • 4 May 2018 • Xiang Cheng, Niladri S. Chatterji, Yasin Abbasi-Yadkori, Peter L. Bartlett, Michael. I. Jordan

We study the problem of sampling from a distribution $p^*(x) \propto \exp\left(-U(x)\right)$, where the function $U$ is $L$-smooth everywhere and $m$-strongly convex outside a ball of radius $R$, but potentially nonconvex inside this ball.

Paper
Add Code

Representing smooth functions as compositions of near-identity functions with implications for deep network optimization

no code implementations • 13 Apr 2018 • Peter L. Bartlett, Steven N. Evans, Philip M. Long

This implies that $h$ can be represented to any accuracy by a deep residual network whose nonlinear layers compute functions with a small Lipschitz constant.

Paper
Add Code

Online learning with kernel losses

no code implementations • 27 Feb 2018 • Aldo Pacchiano, Niladri S. Chatterji, Peter L. Bartlett

We also study the full information setting when the underlying losses are kernel functions and present an adapted exponential weights algorithm and a conditional gradient descent algorithm.

Paper
Add Code

On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo

no code implementations • ICML 2018 • Niladri S. Chatterji, Nicolas Flammarion, Yi-An Ma, Peter L. Bartlett, Michael. I. Jordan

We provide convergence guarantees in Wasserstein distance for a variety of variance-reduction methods: SAGA Langevin diffusion, SVRG Langevin diffusion and control-variate underdamped Langevin diffusion.

Paper
Add Code

Alternating minimization for dictionary learning: Local Convergence Guarantees

no code implementations • NeurIPS 2017 • Niladri S. Chatterji, Peter L. Bartlett

We present theoretical guarantees for an alternating minimization algorithm for the dictionary learning/sparse coding problem.

Dictionary Learning

Paper
Add Code

Underdamped Langevin MCMC: A non-asymptotic analysis

no code implementations • 12 Jul 2017 • Xiang Cheng, Niladri S. Chatterji, Peter L. Bartlett, Michael. I. Jordan

We study the underdamped Langevin diffusion when the log of the target distribution is smooth and strongly concave.

Paper
Add Code

FLAG n' FLARE: Fast Linearly-Coupled Adaptive Gradient Methods

no code implementations • 26 May 2016 • Xiang Cheng, Farbod Roosta-Khorasani, Stefan Palombo, Peter L. Bartlett, Michael W. Mahoney

We consider first order gradient methods for effectively optimizing a composite objective in the form of a sum of smooth and, potentially, non-smooth functions.

Paper
Add Code

Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks

no code implementations • 8 Mar 2017 • Peter L. Bartlett, Nick Harvey, Chris Liaw, Abbas Mehrabian

We prove new upper and lower bounds on the VC-dimension of deep neural networks with the ReLU activation function.

Paper
Add Code

Acceleration and Averaging in Stochastic Mirror Descent Dynamics

no code implementations • 19 Jul 2017 • Walid Krichene, Peter L. Bartlett

We discuss the interaction between the parameters of the dynamics (learning rate and averaging weights) and the covariation of the noise process, and show, in particular, how the asymptotic rate of covariation affects the choice of parameters and, ultimately, the convergence rate.

Paper
Add Code

Recovery Guarantees for One-hidden-layer Neural Networks

no code implementations • ICML 2017 • Kai Zhong, Zhao Song, Prateek Jain, Peter L. Bartlett, Inderjit S. Dhillon

For activation functions that are also smooth, we show $\mathit{local~linear~convergence}$ guarantees of gradient descent under a resampling rule.

Paper
Add Code

Hit-and-Run for Sampling and Planning in Non-Convex Spaces

no code implementations • 19 Oct 2016 • Yasin Abbasi-Yadkori, Peter L. Bartlett, Victor Gabillon, Alan Malek

We propose the Hit-and-Run algorithm for planning and sampling problems in non-convex spaces.

Paper
Add Code

Linear Programming for Large-Scale Markov Decision Problems

no code implementations • 27 Feb 2014 • Yasin Abbasi-Yadkori, Peter L. Bartlett, Alan Malek

We consider the problem of controlling a Markov decision process (MDP) with a large state space, so as to minimize average cost.

Paper
Add Code

Bounding Embeddings of VC Classes into Maximum Classes

no code implementations • 29 Jan 2014 • J. Hyam Rubinstein, Benjamin I. P. Rubinstein, Peter L. Bartlett

The most promising approach to positively resolving the conjecture is by embedding general VC classes into maximum classes without super-linear increase to their VC dimensions, as such embeddings would extend the known compression schemes to all VC classes.

Learning Theory

Paper
Add Code

A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption

no code implementations • 1 Oct 2018 • Peter L. Bartlett, Victor Gabillon, Michal Valko

The difficulty of optimization is measured in terms of 1) the amount of \emph{noise} $b$ of the function evaluation and 2) the local smoothness, $d$, of the function.

Paper
Add Code

Gen-Oja: A Two-time-scale approach for Streaming CCA

no code implementations • 20 Nov 2018 • Kush Bhatia, Aldo Pacchiano, Nicolas Flammarion, Peter L. Bartlett, Michael. I. Jordan

In this paper, we study the problems of principal Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting.

Vocal Bursts Valence Prediction

Paper
Add Code

Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

no code implementations • 20 Dec 2018 • Dhruv Malik, Ashwin Pananjady, Kush Bhatia, Koulik Khamaru, Peter L. Bartlett, Martin J. Wainwright

We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback.

Paper
Add Code

Horizon-Independent Minimax Linear Regression

no code implementations • NeurIPS 2018 • Alan Malek, Peter L. Bartlett

We consider online linear regression: at each round, an adversary reveals a covariate vector, the learner predicts a real value, the adversary reveals a label, and the learner suffers the squared prediction error.

regression

Paper
Add Code

Gen-Oja: Simple & Efficient Algorithm for Streaming Generalized Eigenvector Computation

no code implementations • NeurIPS 2018 • Kush Bhatia, Aldo Pacchiano, Nicolas Flammarion, Peter L. Bartlett, Michael. I. Jordan

In this paper, we study the problems of principle Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting.

Paper
Add Code

Alternating minimization for dictionary learning with random initialization

no code implementations • NeurIPS 2017 • Niladri Chatterji, Peter L. Bartlett

However, in contrast to previous theoretical analyses for this problem, we replace a condition on the operator norm (that is, the largest magnitude singular value) of the true underlying dictionary $A^*$ with a condition on the matrix infinity norm (that is, the largest magnitude term).

Dictionary Learning

Paper
Add Code

Near Minimax Optimal Players for the Finite-Time 3-Expert Prediction Problem

no code implementations • NeurIPS 2017 • Yasin Abbasi, Peter L. Bartlett, Victor Gabillon

We study minimax strategies for the online prediction problem with expert advice.

Paper
Add Code

Acceleration and Averaging in Stochastic Descent Dynamics

no code implementations • NeurIPS 2017 • Walid Krichene, Peter L. Bartlett

We formulate and study a general family of (continuous-time) stochastic dynamics for accelerated first-order minimization of smooth convex functions.

Paper
Add Code

Adaptive Averaging in Accelerated Descent Dynamics

no code implementations • NeurIPS 2016 • Walid Krichene, Alexandre Bayen, Peter L. Bartlett

This dynamics can be described naturally as a coupling of a dual variable accumulating gradients at a given rate $\eta(t)$, and a primal variable obtained as the weighted average of the mirrored dual trajectory, with weights $w(t)$.

Paper
Add Code

Minimax Time Series Prediction

no code implementations • NeurIPS 2015 • Wouter M. Koolen, Alan Malek, Peter L. Bartlett, Yasin Abbasi

We consider an adversarial formulation of the problem ofpredicting a time series with square loss.

Time Series Time Series Prediction

Paper
Add Code

Accelerated Mirror Descent in Continuous and Discrete Time

no code implementations • NeurIPS 2015 • Walid Krichene, Alexandre Bayen, Peter L. Bartlett

We study accelerated mirror descent dynamics in continuous and discrete time.

Paper
Add Code

Efficient Minimax Strategies for Square Loss Games

no code implementations • NeurIPS 2014 • Wouter M. Koolen, Alan Malek, Peter L. Bartlett

We consider online prediction problems where the loss between the prediction and the outcome is measured by the squared Euclidean distance and its generalization, the squared Mahalanobis distance.

Density Estimation

Paper
Add Code

Large-Margin Convex Polytope Machine

no code implementations • NeurIPS 2014 • Alex Kantchelian, Michael C. Tschantz, Ling Huang, Peter L. Bartlett, Anthony D. Joseph, J. D. Tygar

We present the Convex Polytope Machine (CPM), a novel non-linear learning algorithm for large-scale binary classification tasks.

Binary Classification General Classification +1

Paper
Add Code

How to Hedge an Option Against an Adversary: Black-Scholes Pricing is Minimax Optimal

no code implementations • NeurIPS 2013 • Jacob Abernethy, Peter L. Bartlett, Rafael Frongillo, Andre Wibisono

We consider a popular problem in finance, option pricing, through the lens of an online learning game between Nature and an Investor.

Paper
Add Code

Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

no code implementations • NeurIPS 2013 • Yasin Abbasi, Peter L. Bartlett, Varun Kanade, Yevgeny Seldin, Csaba Szepesvari

The goal of the learning algorithm is to choose a path that minimizes the loss while traversing from the start to finish node.

Paper
Add Code

Information-theoretic lower bounds on the oracle complexity of convex optimization

no code implementations • NeurIPS 2009 • Alekh Agarwal, Martin J. Wainwright, Peter L. Bartlett, Pradeep K. Ravikumar

The extensive use of convex optimization in machine learning and statistics makes such an understanding critical to understand fundamental computational limits of learning and estimation.

BIG-bench Machine Learning

Paper
Add Code

Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs

no code implementations • NeurIPS 2007 • Ambuj Tewari, Peter L. Bartlett

OLP is closely related to an algorithm proposed by Burnetas and Katehakis with four key differences: OLP is simpler, it does not require knowledge of the supports of transition probabilities and the proof of the regret bound is simpler, but our regret bound is a constant factor larger than the regret of their algorithm.

Paper
Add Code

Large-Scale Markov Decision Problems via the Linear Programming Dual

no code implementations • 6 Jan 2019 • Yasin Abbasi-Yadkori, Peter L. Bartlett, Xi Chen, Alan Malek

Moreover, we propose an efficient algorithm, scaling with the size of the subspace but not the state space, that is able to find a policy with low excess loss relative to the best policy in this class.

Paper
Add Code

Quantitative Weak Convergence for Discrete Stochastic Processes

no code implementations • 3 Feb 2019 • Xiang Cheng, Peter L. Bartlett, Michael. I. Jordan

In this paper, we quantitative convergence in $W_2$ for a family of Langevin-like stochastic processes that includes stochastic gradient descent and related gradient-based algorithms.

Paper
Add Code

Testing Markov Chains without Hitting

no code implementations • 6 Feb 2019 • Yeshwanth Cherapanamjeri, Peter L. Bartlett

We study the problem of identity testing of markov chains.

Paper
Add Code

Fast Mean Estimation with Sub-Gaussian Rates

1 code implementation • 6 Feb 2019 • Yeshwanth Cherapanamjeri, Nicolas Flammarion, Peter L. Bartlett

We propose an estimator for the mean of a random vector in $\mathbb{R}^d$ that can be computed in time $O(n^4+n^2d)$ for $n$ i. i. d.~samples and that has error bounds matching the sub-Gaussian case.

Paper
Code

OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits

no code implementations • 24 May 2019 • Niladri S. Chatterji, Vidya Muthukumar, Peter L. Bartlett

We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information.

Multi-Armed Bandits

Paper
Add Code

Langevin Monte Carlo without smoothness

no code implementations • 30 May 2019 • Niladri S. Chatterji, Jelena Diakonikolas, Michael. I. Jordan, Peter L. Bartlett

Langevin Monte Carlo (LMC) is an iterative algorithm used to generate samples from a distribution that is known only up to a normalizing constant.

Paper
Add Code

Benign Overfitting in Linear Regression

no code implementations • 26 Jun 2019 • Peter L. Bartlett, Philip M. Long, Gábor Lugosi, Alexander Tsigler

Motivated by this phenomenon, we consider when a perfect fit to training data in linear regression is compatible with accurate prediction.

regression

Paper
Add Code

Stochastic Gradient and Langevin Processes

no code implementations • ICML 2020 • Xiang Cheng, Dong Yin, Peter L. Bartlett, Michael. I. Jordan

We prove quantitative convergence rates at which discrete Langevin-like processes converge to the invariant distribution of a related stochastic differential equation.

Paper
Add Code

Improved Bounds for Discretization of Langevin Diffusions: Near-Optimal Rates without Convexity

no code implementations • 25 Jul 2019 • Wenlong Mou, Nicolas Flammarion, Martin J. Wainwright, Peter L. Bartlett

We present an improved analysis of the Euler-Maruyama discretization of the Langevin diffusion.

Paper
Add Code

Bayesian Robustness: A Nonasymptotic Viewpoint

no code implementations • 27 Jul 2019 • Kush Bhatia, Yi-An Ma, Anca D. Dragan, Peter L. Bartlett, Michael. I. Jordan

We study the problem of robustly estimating the posterior distribution for the setting where observed data can be contaminated with potentially adversarial outliers.

Binary Classification regression

Paper
Add Code

High-Order Langevin Diffusion Yields an Accelerated MCMC Algorithm

no code implementations • 28 Aug 2019 • Wenlong Mou, Yi-An Ma, Martin J. Wainwright, Peter L. Bartlett, Michael. I. Jordan

We propose a Markov chain Monte Carlo (MCMC) algorithm based on third-order Langevin dynamics for sampling from distributions with log-concave and smooth densities.

Vocal Bursts Intensity Prediction

Paper
Add Code

An Efficient Sampling Algorithm for Non-smooth Composite Potentials

no code implementations • 1 Oct 2019 • Wenlong Mou, Nicolas Flammarion, Martin J. Wainwright, Peter L. Bartlett

We consider the problem of sampling from a density of the form $p(x) \propto \exp(-f(x)- g(x))$, where $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is a smooth and strongly convex function and $g: \mathbb{R}^d \rightarrow \mathbb{R}$ is a convex and Lipschitz function.

Paper
Add Code

Infinite-Horizon Policy-Gradient Estimation

no code implementations • 3 Jun 2011 • Jonathan Baxter, Peter L. Bartlett

In this paper we introduce GPOMDP, a simulation-based algorithm for generating a {\em biased} estimate of the gradient of the {\em average reward} in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies.

Paper
Add Code

Hebbian Synaptic Modifications in Spiking Neurons that Learn

no code implementations • 17 Nov 2019 • Peter L. Bartlett, Jonathan Baxter

In this paper, we derive a new model of synaptic plasticity, based on recent algorithms for reinforcement learning (in which an agent attempts to learn appropriate actions to maximize its long-term average reward).

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Sampling for Bayesian Mixture Models: MCMC with Polynomial-Time Mixing

no code implementations • 11 Dec 2019 • Wenlong Mou, Nhat Ho, Martin J. Wainwright, Peter L. Bartlett, Michael. I. Jordan

We study the problem of sampling from the power posterior distribution in Bayesian Gaussian mixture models, a robust version of the classical posterior.

Paper
Add Code

Oracle Lower Bounds for Stochastic Gradient Sampling Algorithms

no code implementations • 1 Feb 2020 • Niladri S. Chatterji, Peter L. Bartlett, Philip M. Long

We consider the problem of sampling from a strongly log-concave density in $\mathbb{R}^d$, and prove an information theoretic lower bound on the number of stochastic gradient queries of the log density needed.

Paper
Add Code

Self-Distillation Amplifies Regularization in Hilbert Space

no code implementations • NeurIPS 2020 • Hossein Mobahi, Mehrdad Farajtabar, Peter L. Bartlett

Knowledge distillation introduced in the deep learning context is a method to transfer knowledge from one architecture to another.

Knowledge Distillation L2 Regularization

Paper
Add Code

On Thompson Sampling with Langevin Algorithms

no code implementations • ICML 2020 • Eric Mazumdar, Aldo Pacchiano, Yi-An Ma, Peter L. Bartlett, Michael. I. Jordan

The resulting approximate Thompson sampling algorithm has logarithmic regret and its computational complexity does not scale with the time horizon of the algorithm.

Thompson Sampling

Paper
Add Code

On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration

no code implementations • 9 Apr 2020 • Wenlong Mou, Chris Junchi Li, Martin J. Wainwright, Peter L. Bartlett, Michael. I. Jordan

When the matrix $\bar{A}$ is Hurwitz, we prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.

Paper
Add Code

Optimal Robust Linear Regression in Nearly Linear Time

no code implementations • 16 Jul 2020 • Yeshwanth Cherapanamjeri, Efe Aras, Nilesh Tripuraneni, Michael. I. Jordan, Nicolas Flammarion, Peter L. Bartlett

We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = \langle X, w^* \rangle + \epsilon$ (with $X \in \mathbb{R}^d$ and $\epsilon$ independent), in which an $\eta$ fraction of the samples have been adversarially corrupted.

regression

Paper
Add Code

Failures of model-dependent generalization bounds for least-norm interpolation

no code implementations • 16 Oct 2020 • Peter L. Bartlett, Philip M. Long

We consider bounds on the generalization performance of the least-norm linear regressor, in the over-parameterized regime where it can interpolate the data.

Generalization Bounds Learning Theory +1

Paper
Add Code

Optimal Mean Estimation without a Variance

no code implementations • 24 Nov 2020 • Yeshwanth Cherapanamjeri, Nilesh Tripuraneni, Peter L. Bartlett, Michael I. Jordan

Concretely, given a sample $\mathbf{X} = \{X_i\}_{i = 1}^n$ from a distribution $\mathcal{D}$ over $\mathbb{R}^d$ with mean $\mu$ which satisfies the following \emph{weak-moment} assumption for some ${\alpha \in [0, 1]}$: \begin{equation*} \forall \|v\| = 1: \mathbb{E}_{X \thicksim \mathcal{D}}[\lvert \langle X - \mu, v\rangle \rvert^{1 + \alpha}] \leq 1, \end{equation*} and given a target failure probability, $\delta$, our goal is to design an estimator which attains the smallest possible confidence interval as a function of $n, d,\delta$.

Paper
Add Code

When does gradient descent with logistic loss find interpolating two-layer networks?

no code implementations • 4 Dec 2020 • Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett

We study the training of finite-width two-layer smoothed ReLU networks for binary classification using the logistic loss.

Binary Classification

Paper
Add Code

When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations?

no code implementations • 9 Feb 2021 • Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett

We establish conditions under which gradient descent applied to fixed-width deep networks drives the logistic loss to zero, and prove bounds on the rate of convergence.

Paper
Add Code

Deep learning: a statistical viewpoint

no code implementations • 16 Mar 2021 • Peter L. Bartlett, Andrea Montanari, Alexander Rakhlin

We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting.

Paper
Add Code

Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm

no code implementations • 17 Mar 2021 • Lin Chen, Bruno Scherrer, Peter L. Bartlett

In this regime, for any $q\in[\gamma^{2}, 1]$, we can construct a hard instance such that the smallest eigenvalue of its feature covariance matrix is $q/d$ and it requires $\Omega\left(\frac{d}{\gamma^{2}\left(q-\gamma^{2}\right)\varepsilon^{2}}\exp\left(\Theta\left(d\gamma^{2}\right)\right)\right)$ samples to approximate the value function up to an additive error $\varepsilon$.

Off-policy evaluation

Paper
Add Code

Agnostic learning with unknown utilities

no code implementations • 17 Apr 2021 • Kush Bhatia, Peter L. Bartlett, Anca D. Dragan, Jacob Steinhardt

This raises an interesting question whether learning is even possible in our setup, given that obtaining a generalizable estimate of utility $u^*$ might not be possible from finitely many samples.

Paper
Add Code

Preference learning along multiple criteria: A game-theoretic perspective

no code implementations • NeurIPS 2020 • Kush Bhatia, Ashwin Pananjady, Peter L. Bartlett, Anca D. Dragan, Martin J. Wainwright

Finally, we showcase the practical utility of our framework in a user study on autonomous driving, where we find that the Blackwell winner outperforms the von Neumann winner for the overall preferences.

Autonomous Driving

Paper
Add Code

On the Theory of Reinforcement Learning with Once-per-Episode Feedback

no code implementations • NeurIPS 2021 • Niladri S. Chatterji, Aldo Pacchiano, Peter L. Bartlett, Michael I. Jordan

We study a theory of reinforcement learning (RL) in which the learner receives binary feedback only once at the end of an episode.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Adversarial Examples in Multi-Layer Random ReLU Networks

no code implementations • NeurIPS 2021 • Peter L. Bartlett, Sébastien Bubeck, Yeshwanth Cherapanamjeri

We consider the phenomenon of adversarial examples in ReLU networks with independent gaussian parameters.

Paper
Add Code

The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks

no code implementations • 25 Aug 2021 • Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett

The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data.

Paper
Add Code

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

no code implementations • 23 Dec 2021 • Wenlong Mou, Ashwin Pananjady, Martin J. Wainwright, Peter L. Bartlett

We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, including sharp dependence on the parameters $(d, t_{\mathrm{mix}})$ in the higher order terms.

Model Selection

Paper
Add Code

Optimal variance-reduced stochastic approximation in Banach spaces

no code implementations • 21 Jan 2022 • Wenlong Mou, Koulik Khamaru, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan

We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.

Q-Learning

Paper
Add Code

Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data

no code implementations • 11 Feb 2022 • Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett

Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent.

Paper
Add Code

Random Feature Amplification: Feature Learning and Generalization in Neural Networks

no code implementations • 15 Feb 2022 • Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett

We consider data with binary labels that are generated by an XOR-like function of the input features.

Paper
Add Code

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

no code implementations • 26 Sep 2022 • Wenlong Mou, Martin J. Wainwright, Peter L. Bartlett

The problem of estimating a linear functional based on observational data is canonical in both the causal inference and bandit literatures.

Causal Inference

Paper
Add Code

The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima

no code implementations • 4 Oct 2022 • Peter L. Bartlett, Philip M. Long, Olivier Bousquet

We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method for deep networks that has exhibited performance improvements on image and language prediction problems.

Paper
Add Code

Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data

no code implementations • 13 Oct 2022 • Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro, Wei Hu

In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data.

Vocal Bursts Intensity Prediction

Paper
Add Code

Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency

no code implementations • 16 Jan 2023 • Wenlong Mou, Peng Ding, Martin J. Wainwright, Peter L. Bartlett

When it is violated, the classical semi-parametric efficiency bound can easily become infinite, so that the instance-optimal risk depends on the function class used to model the regression function.

regression

Paper
Add Code

Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization

no code implementations • 2 Mar 2023 • Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro

Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have an implicit bias towards solutions which satisfy the Karush--Kuhn--Tucker (KKT) conditions for margin maximization.

Paper
Add Code

Prediction, Learning, Uniform Convergence, and Scale-sensitive Dimensions

no code implementations • 21 Apr 2023 • Peter L. Bartlett, Philip M. Long

We apply this result, together with techniques due to Haussler and to Benedek and Itai, to obtain new upper bounds on packing numbers in terms of this scale-sensitive notion of dimension.

Paper
Add Code

Trained Transformers Learn Linear Models In-Context

no code implementations • 16 Jun 2023 • Ruiqi Zhang, Spencer Frei, Peter L. Bartlett

We show that although gradient flow succeeds at finding a global minimum in this setting, the trained transformer is still brittle under mild covariate shifts.

In-Context Learning regression

Paper
Add Code

How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

no code implementations • 12 Oct 2023 • Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Peter L. Bartlett

Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities, enabling them to solve unseen tasks solely based on input contexts without adjusting model parameters.

In-Context Learning regression

Paper
Add Code

On the Statistical Properties of Generative Adversarial Models for Low Intrinsic Data Dimension

no code implementations • 28 Jan 2024 • Saptarshi Chakraborty, Peter L. Bartlett

In this paper, we attempt to bridge the gap between the theory and practice of GANs and their bidirectional variant, Bi-directional GANs (BiGANs), by deriving statistical guarantees on the estimated densities in terms of the intrinsic dimension of the data and the latent space.

Paper
Add Code

In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization

no code implementations • 22 Feb 2024 • Ruiqi Zhang, Jingfeng Wu, Peter L. Bartlett

We study the \emph{in-context learning} (ICL) ability of a \emph{Linear Transformer Block} (LTB) that combines a linear attention component and a linear multi-layer perceptron (MLP) component.

In-Context Learning

Paper
Add Code

Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency

no code implementations • 24 Feb 2024 • Jingfeng Wu, Peter L. Bartlett, Matus Telgarsky, Bin Yu

We consider gradient descent (GD) with a constant stepsize applied to logistic regression with linearly separable data, where the constant stepsize $\eta$ is so large that the loss initially oscillates.

General Classification

Paper
Add Code

A Statistical Analysis of Wasserstein Autoencoders for Intrinsically Low-dimensional Data

no code implementations • 24 Feb 2024 • Saptarshi Chakraborty, Peter L. Bartlett

To bridge the gap between the theory and practice of WAEs, in this paper, we show that WAEs can learn the data distributions when the network architectures are properly chosen.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.