no code implementations • 21 Apr 2023 • Peter L. Bartlett, Philip M. Long
We apply this result, together with techniques due to Haussler and to Benedek and Itai, to obtain new upper bounds on packing numbers in terms of this scale-sensitive notion of dimension.
no code implementations • 2 Mar 2023 • Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro
Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have an implicit bias towards solutions which satisfy the Karush--Kuhn--Tucker (KKT) conditions for margin maximization.
no code implementations • 2 Mar 2023 • Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro
We focus on a setting where the data consists of clusters and the correlations between cluster means are small, and show that in two-layer ReLU networks gradient flow is biased towards solutions that generalize well, but are highly vulnerable to adversarial examples.
no code implementations • 16 Jan 2023 • Wenlong Mou, Peng Ding, Martin J. Wainwright, Peter L. Bartlett
When it is violated, the classical semi-parametric efficiency bound can easily become infinite, so that the instance-optimal risk depends on the function class used to model the regression function.
no code implementations • 13 Oct 2022 • Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro, Wei Hu
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data.
no code implementations • 4 Oct 2022 • Peter L. Bartlett, Philip M. Long, Olivier Bousquet
We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method for deep networks that has exhibited performance improvements on image and language prediction problems.
no code implementations • 26 Sep 2022 • Wenlong Mou, Martin J. Wainwright, Peter L. Bartlett
The problem of estimating a linear functional based on observational data is canonical in both the causal inference and bandit literatures.
no code implementations • 15 Feb 2022 • Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett
We consider data with binary labels that are generated by an XOR-like function of the input features.
no code implementations • 11 Feb 2022 • Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett
Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent.
no code implementations • 21 Jan 2022 • Wenlong Mou, Koulik Khamaru, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
no code implementations • 23 Dec 2021 • Wenlong Mou, Ashwin Pananjady, Martin J. Wainwright, Peter L. Bartlett
We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, including sharp dependence on the parameters $(d, t_{\mathrm{mix}})$ in the higher order terms.
no code implementations • 25 Aug 2021 • Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett
The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data.
no code implementations • NeurIPS 2021 • Peter L. Bartlett, Sébastien Bubeck, Yeshwanth Cherapanamjeri
We consider the phenomenon of adversarial examples in ReLU networks with independent gaussian parameters.
no code implementations • NeurIPS 2021 • Niladri S. Chatterji, Aldo Pacchiano, Peter L. Bartlett, Michael I. Jordan
We study a theory of reinforcement learning (RL) in which the learner receives binary feedback only once at the end of an episode.
no code implementations • NeurIPS 2020 • Kush Bhatia, Ashwin Pananjady, Peter L. Bartlett, Anca D. Dragan, Martin J. Wainwright
Finally, we showcase the practical utility of our framework in a user study on autonomous driving, where we find that the Blackwell winner outperforms the von Neumann winner for the overall preferences.
no code implementations • 17 Apr 2021 • Kush Bhatia, Peter L. Bartlett, Anca D. Dragan, Jacob Steinhardt
This raises an interesting question whether learning is even possible in our setup, given that obtaining a generalizable estimate of utility $u^*$ might not be possible from finitely many samples.
no code implementations • 17 Mar 2021 • Lin Chen, Bruno Scherrer, Peter L. Bartlett
In this regime, for any $q\in[\gamma^{2}, 1]$, we can construct a hard instance such that the smallest eigenvalue of its feature covariance matrix is $q/d$ and it requires $\Omega\left(\frac{d}{\gamma^{2}\left(q-\gamma^{2}\right)\varepsilon^{2}}\exp\left(\Theta\left(d\gamma^{2}\right)\right)\right)$ samples to approximate the value function up to an additive error $\varepsilon$.
no code implementations • 16 Mar 2021 • Peter L. Bartlett, Andrea Montanari, Alexander Rakhlin
We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting.
no code implementations • 9 Feb 2021 • Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett
We establish conditions under which gradient descent applied to fixed-width deep networks drives the logistic loss to zero, and prove bounds on the rate of convergence.
no code implementations • 4 Dec 2020 • Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett
We study the training of finite-width two-layer smoothed ReLU networks for binary classification using the logistic loss.
no code implementations • 24 Nov 2020 • Yeshwanth Cherapanamjeri, Nilesh Tripuraneni, Peter L. Bartlett, Michael I. Jordan
Concretely, given a sample $\mathbf{X} = \{X_i\}_{i = 1}^n$ from a distribution $\mathcal{D}$ over $\mathbb{R}^d$ with mean $\mu$ which satisfies the following \emph{weak-moment} assumption for some ${\alpha \in [0, 1]}$: \begin{equation*} \forall \|v\| = 1: \mathbb{E}_{X \thicksim \mathcal{D}}[\lvert \langle X - \mu, v\rangle \rvert^{1 + \alpha}] \leq 1, \end{equation*} and given a target failure probability, $\delta$, our goal is to design an estimator which attains the smallest possible confidence interval as a function of $n, d,\delta$.
no code implementations • 16 Oct 2020 • Peter L. Bartlett, Philip M. Long
We consider bounds on the generalization performance of the least-norm linear regressor, in the over-parameterized regime where it can interpolate the data.
no code implementations • 16 Jul 2020 • Yeshwanth Cherapanamjeri, Efe Aras, Nilesh Tripuraneni, Michael. I. Jordan, Nicolas Flammarion, Peter L. Bartlett
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = \langle X, w^* \rangle + \epsilon$ (with $X \in \mathbb{R}^d$ and $\epsilon$ independent), in which an $\eta$ fraction of the samples have been adversarially corrupted.
no code implementations • 9 Apr 2020 • Wenlong Mou, Chris Junchi Li, Martin J. Wainwright, Peter L. Bartlett, Michael. I. Jordan
When the matrix $\bar{A}$ is Hurwitz, we prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
no code implementations • ICML 2020 • Eric Mazumdar, Aldo Pacchiano, Yi-An Ma, Peter L. Bartlett, Michael. I. Jordan
The resulting approximate Thompson sampling algorithm has logarithmic regret and its computational complexity does not scale with the time horizon of the algorithm.
no code implementations • NeurIPS 2020 • Hossein Mobahi, Mehrdad Farajtabar, Peter L. Bartlett
Knowledge distillation introduced in the deep learning context is a method to transfer knowledge from one architecture to another.
no code implementations • 1 Feb 2020 • Niladri S. Chatterji, Peter L. Bartlett, Philip M. Long
We consider the problem of sampling from a strongly log-concave density in $\mathbb{R}^d$, and prove an information theoretic lower bound on the number of stochastic gradient queries of the log density needed.
no code implementations • 11 Dec 2019 • Wenlong Mou, Nhat Ho, Martin J. Wainwright, Peter L. Bartlett, Michael. I. Jordan
We study the problem of sampling from the power posterior distribution in Bayesian Gaussian mixture models, a robust version of the classical posterior.
no code implementations • 17 Nov 2019 • Peter L. Bartlett, Jonathan Baxter
In this paper, we derive a new model of synaptic plasticity, based on recent algorithms for reinforcement learning (in which an agent attempts to learn appropriate actions to maximize its long-term average reward).
1 code implementation • 9 Oct 2019 • Tan Nguyen, Nan Ye, Peter L. Bartlett
Theoretically, we first consider whether we can use linear, instead of convex, combinations, and obtain generalization results similar to existing ones for learning from a convex hull.
no code implementations • 1 Oct 2019 • Wenlong Mou, Nicolas Flammarion, Martin J. Wainwright, Peter L. Bartlett
We consider the problem of sampling from a density of the form $p(x) \propto \exp(-f(x)- g(x))$, where $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is a smooth and strongly convex function and $g: \mathbb{R}^d \rightarrow \mathbb{R}$ is a convex and Lipschitz function.
no code implementations • 28 Aug 2019 • Wenlong Mou, Yi-An Ma, Martin J. Wainwright, Peter L. Bartlett, Michael. I. Jordan
We propose a Markov chain Monte Carlo (MCMC) algorithm based on third-order Langevin dynamics for sampling from distributions with log-concave and smooth densities.
no code implementations • 27 Jul 2019 • Kush Bhatia, Yi-An Ma, Anca D. Dragan, Peter L. Bartlett, Michael. I. Jordan
We study the problem of robustly estimating the posterior distribution for the setting where observed data can be contaminated with potentially adversarial outliers.
no code implementations • 25 Jul 2019 • Wenlong Mou, Nicolas Flammarion, Martin J. Wainwright, Peter L. Bartlett
We present an improved analysis of the Euler-Maruyama discretization of the Langevin diffusion.
no code implementations • ICML 2020 • Xiang Cheng, Dong Yin, Peter L. Bartlett, Michael. I. Jordan
We prove quantitative convergence rates at which discrete Langevin-like processes converge to the invariant distribution of a related stochastic differential equation.
no code implementations • 26 Jun 2019 • Peter L. Bartlett, Philip M. Long, Gábor Lugosi, Alexander Tsigler
Motivated by this phenomenon, we consider when a perfect fit to training data in linear regression is compatible with accurate prediction.
no code implementations • 30 May 2019 • Niladri S. Chatterji, Jelena Diakonikolas, Michael. I. Jordan, Peter L. Bartlett
Langevin Monte Carlo (LMC) is an iterative algorithm used to generate samples from a distribution that is known only up to a normalizing constant.
no code implementations • 24 May 2019 • Niladri S. Chatterji, Vidya Muthukumar, Peter L. Bartlett
We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information.
1 code implementation • 6 Feb 2019 • Yeshwanth Cherapanamjeri, Nicolas Flammarion, Peter L. Bartlett
We propose an estimator for the mean of a random vector in $\mathbb{R}^d$ that can be computed in time $O(n^4+n^2d)$ for $n$ i. i. d.~samples and that has error bounds matching the sub-Gaussian case.
no code implementations • 6 Feb 2019 • Yeshwanth Cherapanamjeri, Peter L. Bartlett
We study the problem of identity testing of markov chains.
no code implementations • 3 Feb 2019 • Xiang Cheng, Peter L. Bartlett, Michael. I. Jordan
In this paper, we quantitative convergence in $W_2$ for a family of Langevin-like stochastic processes that includes stochastic gradient descent and related gradient-based algorithms.
no code implementations • 6 Jan 2019 • Yasin Abbasi-Yadkori, Peter L. Bartlett, Xi Chen, Alan Malek
Moreover, we propose an efficient algorithm, scaling with the size of the subspace but not the state space, that is able to find a policy with low excess loss relative to the best policy in this class.
no code implementations • 20 Dec 2018 • Dhruv Malik, Ashwin Pananjady, Kush Bhatia, Koulik Khamaru, Peter L. Bartlett, Martin J. Wainwright
We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback.
no code implementations • NeurIPS 2018 • Kush Bhatia, Aldo Pacchiano, Nicolas Flammarion, Peter L. Bartlett, Michael. I. Jordan
In this paper, we study the problems of principle Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting.
no code implementations • NeurIPS 2018 • Alan Malek, Peter L. Bartlett
We consider online linear regression: at each round, an adversary reveals a covariate vector, the learner predicts a real value, the adversary reveals a label, and the learner suffers the squared prediction error.
no code implementations • 20 Nov 2018 • Kush Bhatia, Aldo Pacchiano, Nicolas Flammarion, Peter L. Bartlett, Michael. I. Jordan
In this paper, we study the problems of principal Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting.
no code implementations • 1 Oct 2018 • Peter L. Bartlett, Victor Gabillon, Michal Valko
The difficulty of optimization is measured in terms of 1) the amount of \emph{noise} $b$ of the function evaluation and 2) the local smoothness, $d$, of the function.
no code implementations • 22 May 2018 • Vidya Muthukumar, Mitas Ray, Anant Sahai, Peter L. Bartlett
We introduce algorithms for online, full-information prediction that are competitive with contextual tree experts of unknown complexity, in both probabilistic and adversarial settings.
no code implementations • 4 May 2018 • Xiang Cheng, Niladri S. Chatterji, Yasin Abbasi-Yadkori, Peter L. Bartlett, Michael. I. Jordan
We study the problem of sampling from a distribution $p^*(x) \propto \exp\left(-U(x)\right)$, where the function $U$ is $L$-smooth everywhere and $m$-strongly convex outside a ball of radius $R$, but potentially nonconvex inside this ball.
no code implementations • 13 Apr 2018 • Peter L. Bartlett, Steven N. Evans, Philip M. Long
This implies that $h$ can be represented to any accuracy by a deep residual network whose nonlinear layers compute functions with a small Lipschitz constant.
no code implementations • 27 Feb 2018 • Aldo Pacchiano, Niladri S. Chatterji, Peter L. Bartlett
We also study the full information setting when the underlying losses are kernel functions and present an adapted exponential weights algorithm and a conditional gradient descent algorithm.
no code implementations • ICML 2018 • Peter L. Bartlett, David P. Helmbold, Philip M. Long
We provide polynomial bounds on the number of iterations for gradient descent to approximate the least squares matrix $\Phi$, in the case where the initial hypothesis $\Theta_1 = ... = \Theta_L = I$ has excess loss bounded by a small enough constant.
no code implementations • ICML 2018 • Niladri S. Chatterji, Nicolas Flammarion, Yi-An Ma, Peter L. Bartlett, Michael. I. Jordan
We provide convergence guarantees in Wasserstein distance for a variety of variance-reduction methods: SAGA Langevin diffusion, SVRG Langevin diffusion and control-variate underdamped Langevin diffusion.
no code implementations • NeurIPS 2017 • Yasin Abbasi, Peter L. Bartlett, Victor Gabillon
We study minimax strategies for the online prediction problem with expert advice.
no code implementations • NeurIPS 2017 • Walid Krichene, Peter L. Bartlett
We formulate and study a general family of (continuous-time) stochastic dynamics for accelerated first-order minimization of smooth convex functions.
no code implementations • NeurIPS 2017 • Niladri Chatterji, Peter L. Bartlett
However, in contrast to previous theoretical analyses for this problem, we replace a condition on the operator norm (that is, the largest magnitude singular value) of the true underlying dictionary $A^*$ with a condition on the matrix infinity norm (that is, the largest magnitude term).
no code implementations • NeurIPS 2017 • Niladri S. Chatterji, Peter L. Bartlett
We present theoretical guarantees for an alternating minimization algorithm for the dictionary learning/sparse coding problem.
no code implementations • 19 Jul 2017 • Walid Krichene, Peter L. Bartlett
We discuss the interaction between the parameters of the dynamics (learning rate and averaging weights) and the covariation of the noise process, and show, in particular, how the asymptotic rate of covariation affects the choice of parameters and, ultimately, the convergence rate.
no code implementations • 12 Jul 2017 • Xiang Cheng, Niladri S. Chatterji, Peter L. Bartlett, Michael. I. Jordan
We study the underdamped Langevin diffusion when the log of the target distribution is smooth and strongly concave.
no code implementations • ICML 2017 • Kai Zhong, Zhao Song, Prateek Jain, Peter L. Bartlett, Inderjit S. Dhillon
For activation functions that are also smooth, we show $\mathit{local~linear~convergence}$ guarantees of gradient descent under a resampling rule.
no code implementations • 8 Mar 2017 • Peter L. Bartlett, Nick Harvey, Chris Liaw, Abbas Mehrabian
We prove new upper and lower bounds on the VC-dimension of deep neural networks with the ReLU activation function.
no code implementations • NeurIPS 2016 • Walid Krichene, Alexandre Bayen, Peter L. Bartlett
This dynamics can be described naturally as a coupling of a dual variable accumulating gradients at a given rate $\eta(t)$, and a primal variable obtained as the weighted average of the mirrored dual trajectory, with weights $w(t)$.
18 code implementations • 9 Nov 2016 • Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel
The activations of the RNN store the state of the "fast" RL algorithm on the current (previously unseen) MDP.
no code implementations • 19 Oct 2016 • Yasin Abbasi-Yadkori, Peter L. Bartlett, Victor Gabillon, Alan Malek
We propose the Hit-and-Run algorithm for planning and sampling problems in non-convex spaces.
no code implementations • 26 May 2016 • Xiang Cheng, Farbod Roosta-Khorasani, Stefan Palombo, Peter L. Bartlett, Michael W. Mahoney
We consider first order gradient methods for effectively optimizing a composite objective in the form of a sum of smooth and, potentially, non-smooth functions.
no code implementations • NeurIPS 2015 • Wouter M. Koolen, Alan Malek, Peter L. Bartlett, Yasin Abbasi
We consider an adversarial formulation of the problem ofpredicting a time series with square loss.
no code implementations • NeurIPS 2015 • Walid Krichene, Alexandre Bayen, Peter L. Bartlett
We study accelerated mirror descent dynamics in continuous and discrete time.
no code implementations • NeurIPS 2014 • Wouter M. Koolen, Alan Malek, Peter L. Bartlett
We consider online prediction problems where the loss between the prediction and the outcome is measured by the squared Euclidean distance and its generalization, the squared Mahalanobis distance.
no code implementations • NeurIPS 2014 • Alex Kantchelian, Michael C. Tschantz, Ling Huang, Peter L. Bartlett, Anthony D. Joseph, J. D. Tygar
We present the Convex Polytope Machine (CPM), a novel non-linear learning algorithm for large-scale binary classification tasks.
no code implementations • 27 Feb 2014 • Yasin Abbasi-Yadkori, Peter L. Bartlett, Alan Malek
We consider the problem of controlling a Markov decision process (MDP) with a large state space, so as to minimize average cost.
no code implementations • 29 Jan 2014 • J. Hyam Rubinstein, Benjamin I. P. Rubinstein, Peter L. Bartlett
The most promising approach to positively resolving the conjecture is by embedding general VC classes into maximum classes without super-linear increase to their VC dimensions, as such embeddings would extend the known compression schemes to all VC classes.
no code implementations • NeurIPS 2013 • Yasin Abbasi, Peter L. Bartlett, Varun Kanade, Yevgeny Seldin, Csaba Szepesvari
The goal of the learning algorithm is to choose a path that minimizes the loss while traversing from the start to finish node.
no code implementations • NeurIPS 2013 • Jacob Abernethy, Peter L. Bartlett, Rafael Frongillo, Andre Wibisono
We consider a popular problem in finance, option pricing, through the lens of an online learning game between Nature and an Investor.
no code implementations • 3 Jun 2011 • Jonathan Baxter, Peter L. Bartlett
In this paper we introduce GPOMDP, a simulation-based algorithm for generating a {\em biased} estimate of the gradient of the {\em average reward} in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies.
no code implementations • NeurIPS 2009 • Alekh Agarwal, Martin J. Wainwright, Peter L. Bartlett, Pradeep K. Ravikumar
The extensive use of convex optimization in machine learning and statistics makes such an understanding critical to understand fundamental computational limits of learning and estimation.
no code implementations • NeurIPS 2007 • Ambuj Tewari, Peter L. Bartlett
OLP is closely related to an algorithm proposed by Burnetas and Katehakis with four key differences: OLP is simpler, it does not require knowledge of the supports of transition probabilities and the proof of the regret bound is simpler, but our regret bound is a constant factor larger than the regret of their algorithm.