no code implementations • 12 Jun 2024 • Yuhang Cai, Jingfeng Wu, Song Mei, Michael Lindsey, Peter L. Bartlett

The typical training of neural networks using large stepsize gradient descent (GD) under the logistic loss often involves two distinct phases, where the empirical risk oscillates in the first phase but decreases monotonically in the second phase.

no code implementations • 12 Jun 2024 • Licong Lin, Jingfeng Wu, Sham M. Kakade, Peter L. Bartlett, Jason D. Lee

Empirically, large-scale deep learning models often satisfy a neural scaling law: the test error of the trained model improves polynomially as the model size and data size grow.

no code implementations • 24 Feb 2024 • Jingfeng Wu, Peter L. Bartlett, Matus Telgarsky, Bin Yu

We consider gradient descent (GD) with a constant stepsize applied to logistic regression with linearly separable data, where the constant stepsize $\eta$ is so large that the loss initially oscillates.

no code implementations • 24 Feb 2024 • Saptarshi Chakraborty, Peter L. Bartlett

To bridge the gap between the theory and practice of WAEs, in this paper, we show that WAEs can learn the data distributions when the network architectures are properly chosen.

no code implementations • 22 Feb 2024 • Ruiqi Zhang, Jingfeng Wu, Peter L. Bartlett

We study the \emph{in-context learning} (ICL) ability of a \emph{Linear Transformer Block} (LTB) that combines a linear attention component and a linear multi-layer perceptron (MLP) component.

no code implementations • 28 Jan 2024 • Saptarshi Chakraborty, Peter L. Bartlett

In this paper, we attempt to bridge the gap between the theory and practice of GANs and their bidirectional variant, Bi-directional GANs (BiGANs), by deriving statistical guarantees on the estimated densities in terms of the intrinsic dimension of the data and the latent space.

no code implementations • 12 Oct 2023 • Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Peter L. Bartlett

Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities, enabling them to solve unseen tasks solely based on input contexts without adjusting model parameters.

1 code implementation • 21 Sep 2023 • Philip M. Long, Peter L. Bartlett

Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size $\eta$, the operator norm of the Hessian of the loss grows until it approximately reaches $2/\eta$, after which it fluctuates around this value.

no code implementations • 16 Jun 2023 • Ruiqi Zhang, Spencer Frei, Peter L. Bartlett

We show that although gradient flow succeeds at finding a global minimum in this setting, the trained transformer is still brittle under mild covariate shifts.

no code implementations • 21 Apr 2023 • Peter L. Bartlett, Philip M. Long

We apply this result, together with techniques due to Haussler and to Benedek and Itai, to obtain new upper bounds on packing numbers in terms of this scale-sensitive notion of dimension.

no code implementations • 2 Mar 2023 • Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro

Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have an implicit bias towards solutions which satisfy the Karush--Kuhn--Tucker (KKT) conditions for margin maximization.

no code implementations • 16 Jan 2023 • Wenlong Mou, Peng Ding, Martin J. Wainwright, Peter L. Bartlett

When it is violated, the classical semi-parametric efficiency bound can easily become infinite, so that the instance-optimal risk depends on the function class used to model the regression function.

no code implementations • 13 Oct 2022 • Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro, Wei Hu

In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data.

no code implementations • 4 Oct 2022 • Peter L. Bartlett, Philip M. Long, Olivier Bousquet

We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method for deep networks that has exhibited performance improvements on image and language prediction problems.

no code implementations • 26 Sep 2022 • Wenlong Mou, Martin J. Wainwright, Peter L. Bartlett

The problem of estimating a linear functional based on observational data is canonical in both the causal inference and bandit literatures.

no code implementations • 15 Feb 2022 • Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett

We consider data with binary labels that are generated by an XOR-like function of the input features.

no code implementations • 11 Feb 2022 • Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett

Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent.

no code implementations • 21 Jan 2022 • Wenlong Mou, Koulik Khamaru, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan

We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.

no code implementations • 23 Dec 2021 • Wenlong Mou, Ashwin Pananjady, Martin J. Wainwright, Peter L. Bartlett

We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, including sharp dependence on the parameters $(d, t_{\mathrm{mix}})$ in the higher order terms.

no code implementations • 25 Aug 2021 • Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett

The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data.

no code implementations • NeurIPS 2021 • Peter L. Bartlett, Sébastien Bubeck, Yeshwanth Cherapanamjeri

We consider the phenomenon of adversarial examples in ReLU networks with independent gaussian parameters.

no code implementations • NeurIPS 2021 • Niladri S. Chatterji, Aldo Pacchiano, Peter L. Bartlett, Michael I. Jordan

We study a theory of reinforcement learning (RL) in which the learner receives binary feedback only once at the end of an episode.

no code implementations • NeurIPS 2020 • Kush Bhatia, Ashwin Pananjady, Peter L. Bartlett, Anca D. Dragan, Martin J. Wainwright

Finally, we showcase the practical utility of our framework in a user study on autonomous driving, where we find that the Blackwell winner outperforms the von Neumann winner for the overall preferences.

no code implementations • 17 Apr 2021 • Kush Bhatia, Peter L. Bartlett, Anca D. Dragan, Jacob Steinhardt

This raises an interesting question whether learning is even possible in our setup, given that obtaining a generalizable estimate of utility $u^*$ might not be possible from finitely many samples.

no code implementations • 17 Mar 2021 • Lin Chen, Bruno Scherrer, Peter L. Bartlett

In this regime, for any $q\in[\gamma^{2}, 1]$, we can construct a hard instance such that the smallest eigenvalue of its feature covariance matrix is $q/d$ and it requires $\Omega\left(\frac{d}{\gamma^{2}\left(q-\gamma^{2}\right)\varepsilon^{2}}\exp\left(\Theta\left(d\gamma^{2}\right)\right)\right)$ samples to approximate the value function up to an additive error $\varepsilon$.

no code implementations • 16 Mar 2021 • Peter L. Bartlett, Andrea Montanari, Alexander Rakhlin

We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting.

no code implementations • 9 Feb 2021 • Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett

We establish conditions under which gradient descent applied to fixed-width deep networks drives the logistic loss to zero, and prove bounds on the rate of convergence.

no code implementations • 4 Dec 2020 • Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett

We study the training of finite-width two-layer smoothed ReLU networks for binary classification using the logistic loss.

no code implementations • 24 Nov 2020 • Yeshwanth Cherapanamjeri, Nilesh Tripuraneni, Peter L. Bartlett, Michael I. Jordan

Concretely, given a sample $\mathbf{X} = \{X_i\}_{i = 1}^n$ from a distribution $\mathcal{D}$ over $\mathbb{R}^d$ with mean $\mu$ which satisfies the following \emph{weak-moment} assumption for some ${\alpha \in [0, 1]}$: \begin{equation*} \forall \|v\| = 1: \mathbb{E}_{X \thicksim \mathcal{D}}[\lvert \langle X - \mu, v\rangle \rvert^{1 + \alpha}] \leq 1, \end{equation*} and given a target failure probability, $\delta$, our goal is to design an estimator which attains the smallest possible confidence interval as a function of $n, d,\delta$.

no code implementations • 16 Oct 2020 • Peter L. Bartlett, Philip M. Long

We consider bounds on the generalization performance of the least-norm linear regressor, in the over-parameterized regime where it can interpolate the data.

no code implementations • 16 Jul 2020 • Yeshwanth Cherapanamjeri, Efe Aras, Nilesh Tripuraneni, Michael. I. Jordan, Nicolas Flammarion, Peter L. Bartlett

We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = \langle X, w^* \rangle + \epsilon$ (with $X \in \mathbb{R}^d$ and $\epsilon$ independent), in which an $\eta$ fraction of the samples have been adversarially corrupted.

no code implementations • 9 Apr 2020 • Wenlong Mou, Chris Junchi Li, Martin J. Wainwright, Peter L. Bartlett, Michael. I. Jordan

When the matrix $\bar{A}$ is Hurwitz, we prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.

no code implementations • ICML 2020 • Eric Mazumdar, Aldo Pacchiano, Yi-An Ma, Peter L. Bartlett, Michael. I. Jordan

The resulting approximate Thompson sampling algorithm has logarithmic regret and its computational complexity does not scale with the time horizon of the algorithm.

no code implementations • NeurIPS 2020 • Hossein Mobahi, Mehrdad Farajtabar, Peter L. Bartlett

Knowledge distillation introduced in the deep learning context is a method to transfer knowledge from one architecture to another.

no code implementations • 1 Feb 2020 • Niladri S. Chatterji, Peter L. Bartlett, Philip M. Long

We consider the problem of sampling from a strongly log-concave density in $\mathbb{R}^d$, and prove an information theoretic lower bound on the number of stochastic gradient queries of the log density needed.

no code implementations • 11 Dec 2019 • Wenlong Mou, Nhat Ho, Martin J. Wainwright, Peter L. Bartlett, Michael. I. Jordan

We study the problem of sampling from the power posterior distribution in Bayesian Gaussian mixture models, a robust version of the classical posterior.

no code implementations • 17 Nov 2019 • Peter L. Bartlett, Jonathan Baxter

In this paper, we derive a new model of synaptic plasticity, based on recent algorithms for reinforcement learning (in which an agent attempts to learn appropriate actions to maximize its long-term average reward).

1 code implementation • 9 Oct 2019 • Tan Nguyen, Nan Ye, Peter L. Bartlett

Theoretically, we first consider whether we can use linear, instead of convex, combinations, and obtain generalization results similar to existing ones for learning from a convex hull.

no code implementations • 1 Oct 2019 • Wenlong Mou, Nicolas Flammarion, Martin J. Wainwright, Peter L. Bartlett

We consider the problem of sampling from a density of the form $p(x) \propto \exp(-f(x)- g(x))$, where $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is a smooth and strongly convex function and $g: \mathbb{R}^d \rightarrow \mathbb{R}$ is a convex and Lipschitz function.

no code implementations • 28 Aug 2019 • Wenlong Mou, Yi-An Ma, Martin J. Wainwright, Peter L. Bartlett, Michael. I. Jordan

We propose a Markov chain Monte Carlo (MCMC) algorithm based on third-order Langevin dynamics for sampling from distributions with log-concave and smooth densities.

no code implementations • 27 Jul 2019 • Kush Bhatia, Yi-An Ma, Anca D. Dragan, Peter L. Bartlett, Michael. I. Jordan

We study the problem of robustly estimating the posterior distribution for the setting where observed data can be contaminated with potentially adversarial outliers.

no code implementations • 25 Jul 2019 • Wenlong Mou, Nicolas Flammarion, Martin J. Wainwright, Peter L. Bartlett

We present an improved analysis of the Euler-Maruyama discretization of the Langevin diffusion.

no code implementations • ICML 2020 • Xiang Cheng, Dong Yin, Peter L. Bartlett, Michael. I. Jordan

We prove quantitative convergence rates at which discrete Langevin-like processes converge to the invariant distribution of a related stochastic differential equation.

no code implementations • 26 Jun 2019 • Peter L. Bartlett, Philip M. Long, Gábor Lugosi, Alexander Tsigler

Motivated by this phenomenon, we consider when a perfect fit to training data in linear regression is compatible with accurate prediction.

no code implementations • 30 May 2019 • Niladri S. Chatterji, Jelena Diakonikolas, Michael. I. Jordan, Peter L. Bartlett

Langevin Monte Carlo (LMC) is an iterative algorithm used to generate samples from a distribution that is known only up to a normalizing constant.

no code implementations • 24 May 2019 • Niladri S. Chatterji, Vidya Muthukumar, Peter L. Bartlett

We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information.

1 code implementation • 6 Feb 2019 • Yeshwanth Cherapanamjeri, Nicolas Flammarion, Peter L. Bartlett

We propose an estimator for the mean of a random vector in $\mathbb{R}^d$ that can be computed in time $O(n^4+n^2d)$ for $n$ i. i. d.~samples and that has error bounds matching the sub-Gaussian case.

no code implementations • 6 Feb 2019 • Yeshwanth Cherapanamjeri, Peter L. Bartlett

We study the problem of identity testing of markov chains.

no code implementations • 3 Feb 2019 • Xiang Cheng, Peter L. Bartlett, Michael. I. Jordan

In this paper, we quantitative convergence in $W_2$ for a family of Langevin-like stochastic processes that includes stochastic gradient descent and related gradient-based algorithms.

no code implementations • 6 Jan 2019 • Yasin Abbasi-Yadkori, Peter L. Bartlett, Xi Chen, Alan Malek

Moreover, we propose an efficient algorithm, scaling with the size of the subspace but not the state space, that is able to find a policy with low excess loss relative to the best policy in this class.

no code implementations • 20 Dec 2018 • Dhruv Malik, Ashwin Pananjady, Kush Bhatia, Koulik Khamaru, Peter L. Bartlett, Martin J. Wainwright

We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback.

no code implementations • NeurIPS 2018 • Alan Malek, Peter L. Bartlett

We consider online linear regression: at each round, an adversary reveals a covariate vector, the learner predicts a real value, the adversary reveals a label, and the learner suffers the squared prediction error.

no code implementations • NeurIPS 2018 • Kush Bhatia, Aldo Pacchiano, Nicolas Flammarion, Peter L. Bartlett, Michael. I. Jordan

In this paper, we study the problems of principle Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting.

no code implementations • 20 Nov 2018 • Kush Bhatia, Aldo Pacchiano, Nicolas Flammarion, Peter L. Bartlett, Michael. I. Jordan

In this paper, we study the problems of principal Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting.

no code implementations • 1 Oct 2018 • Peter L. Bartlett, Victor Gabillon, Michal Valko

The difficulty of optimization is measured in terms of 1) the amount of \emph{noise} $b$ of the function evaluation and 2) the local smoothness, $d$, of the function.

no code implementations • 22 May 2018 • Vidya Muthukumar, Mitas Ray, Anant Sahai, Peter L. Bartlett

We introduce algorithms for online, full-information prediction that are competitive with contextual tree experts of unknown complexity, in both probabilistic and adversarial settings.

no code implementations • 4 May 2018 • Xiang Cheng, Niladri S. Chatterji, Yasin Abbasi-Yadkori, Peter L. Bartlett, Michael. I. Jordan

We study the problem of sampling from a distribution $p^*(x) \propto \exp\left(-U(x)\right)$, where the function $U$ is $L$-smooth everywhere and $m$-strongly convex outside a ball of radius $R$, but potentially nonconvex inside this ball.

no code implementations • 13 Apr 2018 • Peter L. Bartlett, Steven N. Evans, Philip M. Long

This implies that $h$ can be represented to any accuracy by a deep residual network whose nonlinear layers compute functions with a small Lipschitz constant.

no code implementations • 27 Feb 2018 • Aldo Pacchiano, Niladri S. Chatterji, Peter L. Bartlett

We also study the full information setting when the underlying losses are kernel functions and present an adapted exponential weights algorithm and a conditional gradient descent algorithm.

no code implementations • ICML 2018 • Peter L. Bartlett, David P. Helmbold, Philip M. Long

We provide polynomial bounds on the number of iterations for gradient descent to approximate the least squares matrix $\Phi$, in the case where the initial hypothesis $\Theta_1 = ... = \Theta_L = I$ has excess loss bounded by a small enough constant.

no code implementations • ICML 2018 • Niladri S. Chatterji, Nicolas Flammarion, Yi-An Ma, Peter L. Bartlett, Michael. I. Jordan

We provide convergence guarantees in Wasserstein distance for a variety of variance-reduction methods: SAGA Langevin diffusion, SVRG Langevin diffusion and control-variate underdamped Langevin diffusion.

no code implementations • NeurIPS 2017 • Walid Krichene, Peter L. Bartlett

We formulate and study a general family of (continuous-time) stochastic dynamics for accelerated first-order minimization of smooth convex functions.

no code implementations • NeurIPS 2017 • Niladri Chatterji, Peter L. Bartlett

However, in contrast to previous theoretical analyses for this problem, we replace a condition on the operator norm (that is, the largest magnitude singular value) of the true underlying dictionary $A^*$ with a condition on the matrix infinity norm (that is, the largest magnitude term).

no code implementations • NeurIPS 2017 • Yasin Abbasi, Peter L. Bartlett, Victor Gabillon

We study minimax strategies for the online prediction problem with expert advice.

no code implementations • NeurIPS 2017 • Niladri S. Chatterji, Peter L. Bartlett

We present theoretical guarantees for an alternating minimization algorithm for the dictionary learning/sparse coding problem.

no code implementations • 19 Jul 2017 • Walid Krichene, Peter L. Bartlett

We discuss the interaction between the parameters of the dynamics (learning rate and averaging weights) and the covariation of the noise process, and show, in particular, how the asymptotic rate of covariation affects the choice of parameters and, ultimately, the convergence rate.

no code implementations • 12 Jul 2017 • Xiang Cheng, Niladri S. Chatterji, Peter L. Bartlett, Michael. I. Jordan

We study the underdamped Langevin diffusion when the log of the target distribution is smooth and strongly concave.

no code implementations • ICML 2017 • Kai Zhong, Zhao Song, Prateek Jain, Peter L. Bartlett, Inderjit S. Dhillon

For activation functions that are also smooth, we show $\mathit{local~linear~convergence}$ guarantees of gradient descent under a resampling rule.

no code implementations • 8 Mar 2017 • Peter L. Bartlett, Nick Harvey, Chris Liaw, Abbas Mehrabian

We prove new upper and lower bounds on the VC-dimension of deep neural networks with the ReLU activation function.

no code implementations • NeurIPS 2016 • Walid Krichene, Alexandre Bayen, Peter L. Bartlett

This dynamics can be described naturally as a coupling of a dual variable accumulating gradients at a given rate $\eta(t)$, and a primal variable obtained as the weighted average of the mirrored dual trajectory, with weights $w(t)$.

19 code implementations • 9 Nov 2016 • Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel

The activations of the RNN store the state of the "fast" RL algorithm on the current (previously unseen) MDP.

no code implementations • 19 Oct 2016 • Yasin Abbasi-Yadkori, Peter L. Bartlett, Victor Gabillon, Alan Malek

We propose the Hit-and-Run algorithm for planning and sampling problems in non-convex spaces.

no code implementations • 26 May 2016 • Xiang Cheng, Farbod Roosta-Khorasani, Stefan Palombo, Peter L. Bartlett, Michael W. Mahoney

We consider first order gradient methods for effectively optimizing a composite objective in the form of a sum of smooth and, potentially, non-smooth functions.

no code implementations • NeurIPS 2015 • Wouter M. Koolen, Alan Malek, Peter L. Bartlett, Yasin Abbasi

We consider an adversarial formulation of the problem ofpredicting a time series with square loss.

no code implementations • NeurIPS 2015 • Walid Krichene, Alexandre Bayen, Peter L. Bartlett

We study accelerated mirror descent dynamics in continuous and discrete time.

no code implementations • NeurIPS 2014 • Alex Kantchelian, Michael C. Tschantz, Ling Huang, Peter L. Bartlett, Anthony D. Joseph, J. D. Tygar

We present the Convex Polytope Machine (CPM), a novel non-linear learning algorithm for large-scale binary classification tasks.

no code implementations • NeurIPS 2014 • Wouter M. Koolen, Alan Malek, Peter L. Bartlett

We consider online prediction problems where the loss between the prediction and the outcome is measured by the squared Euclidean distance and its generalization, the squared Mahalanobis distance.

no code implementations • 27 Feb 2014 • Yasin Abbasi-Yadkori, Peter L. Bartlett, Alan Malek

We consider the problem of controlling a Markov decision process (MDP) with a large state space, so as to minimize average cost.

no code implementations • 29 Jan 2014 • J. Hyam Rubinstein, Benjamin I. P. Rubinstein, Peter L. Bartlett

The most promising approach to positively resolving the conjecture is by embedding general VC classes into maximum classes without super-linear increase to their VC dimensions, as such embeddings would extend the known compression schemes to all VC classes.

no code implementations • NeurIPS 2013 • Yasin Abbasi, Peter L. Bartlett, Varun Kanade, Yevgeny Seldin, Csaba Szepesvari

The goal of the learning algorithm is to choose a path that minimizes the loss while traversing from the start to finish node.

no code implementations • NeurIPS 2013 • Jacob Abernethy, Peter L. Bartlett, Rafael Frongillo, Andre Wibisono

We consider a popular problem in finance, option pricing, through the lens of an online learning game between Nature and an Investor.

no code implementations • 3 Jun 2011 • Jonathan Baxter, Peter L. Bartlett

In this paper we introduce GPOMDP, a simulation-based algorithm for generating a {\em biased} estimate of the gradient of the {\em average reward} in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies.

no code implementations • NeurIPS 2009 • Alekh Agarwal, Martin J. Wainwright, Peter L. Bartlett, Pradeep K. Ravikumar

The extensive use of convex optimization in machine learning and statistics makes such an understanding critical to understand fundamental computational limits of learning and estimation.

no code implementations • NeurIPS 2007 • Ambuj Tewari, Peter L. Bartlett

OLP is closely related to an algorithm proposed by Burnetas and Katehakis with four key differences: OLP is simpler, it does not require knowledge of the supports of transition probabilities and the proof of the regret bound is simpler, but our regret bound is a constant factor larger than the regret of their algorithm.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.