no code implementations • 1 Oct 2024 • Kabir Aladin Verchand, Andrea Montanari
When both the covariates and the additive corruptions are independent and normally distributed, we provide exact characterizations of both the prediction error as well as the estimation error.
1 code implementation • 25 Jul 2024 • Katherine L. Mentzer, Andrea Montanari
In order to capture this trade-off and optimize storage of training data, we propose a `storage scaling law' that describes the joint evolution of test error with sample size and number of bits per image.
no code implementations • 5 Jun 2024 • Andrea Montanari, Kangjie Zhou
Given $d$-dimensional standard Gaussian vectors $\boldsymbol{x}_1,\dots, \boldsymbol{x}_n$, we consider the set of all empirical distributions of its $m$-dimensional projections, for $m$ a fixed constant.
no code implementations • 6 Feb 2024 • Ayush Jain, Andrea Montanari, Eren Sasoglu
We analyze mathematically this method under several classical statistical models, and validate our findings empirically on datasets from different domains.
no code implementations • 29 Sep 2023 • Andrea Montanari, Feng Ruan, Basil Saeed, Youngtak Sohn
Working in the high-dimensional regime in which the number of features $p$, the number of samples $n$ and the input dimension $d$ (in the nonlinear featurization setting) diverge, with ratios of order one, we prove a universality result establishing that the asymptotic behavior is completely determined by the expected covariance of feature vectors and by the covariance between features and labels.
no code implementations • 25 Sep 2023 • Germain Kolossov, Andrea Montanari, Pulkit Tandon
Given a sample of size $N$, it is often useful to select a subsample of smaller size $n<N$ to be used for statistical estimation or learning.
no code implementations • 25 Aug 2023 • Theodor Misiakiewicz, Andrea Montanari
In these six lectures, we examine what can be learnt about the behavior of multi-layer neural networks from the analysis of linear models.
no code implementations • 18 May 2023 • Andrea Montanari
Diffusions are a successful technique to sample from high-dimensional distributions can be either explicitly given or learnt from a collection of samples.
no code implementations • 28 Feb 2023 • Raphaël Berthier, Andrea Montanari, Kangjie Zhou
In this paper, we study the gradient flow dynamics of a wide two-layer neural network in high-dimension, when data are distributed according to a single-index model (i. e., the target function depends on a one-dimensional projection of the covariates).
no code implementations • 16 Oct 2022 • Chen Cheng, Andrea Montanari
However, random matrix theory is largely focused on the proportional asymptotics in which the number of columns grows proportionally to the number of rows of the data matrix.
no code implementations • 14 Jun 2022 • Andrea Montanari, Kangjie Zhou
Denoting by $\mathscr{F}_{m, \alpha}$ the set of probability distributions in $\mathbb{R}^m$ that arise as low-dimensional projections in this limit, we establish new inner and outer bounds on $\mathscr{F}_{m, \alpha}$.
no code implementations • 31 Mar 2022 • Andrea Montanari, Yuchen Wu
A substantial body of empirical work documents the lack of robustness in deep learning models to adversarial examples.
no code implementations • 17 Feb 2022 • Andrea Montanari, Basil Saeed
In particular, the asymptotics of these quantities can be computed $-$to leading order$-$ under a simpler model in which the feature vectors ${\boldsymbol x}_i$ are replaced by Gaussian vectors ${\boldsymbol g}_i$ with the same covariance.
no code implementations • 28 Oct 2021 • Andrea Montanari, Yiqiao Zhong, Kangjie Zhou
In the negative perceptron problem we are given $n$ data points $({\boldsymbol x}_i, y_i)$, where ${\boldsymbol x}_i$ is a $d$-dimensional vector and $y_i\in\{+1,-1\}$ is a binary label.
no code implementations • NeurIPS 2021 • Yuchen Wu, Mohammadhossein Bateni, Andre Linhares, Filipe Miguel Goncalves de Almeida, Andrea Montanari, Ashkan Norouzi-Fard, Jakab Tardos
The community detection problem requires to cluster the nodes of a network into a small number of well-connected "communities".
no code implementations • 30 Mar 2021 • Michael Celentano, Theodor Misiakiewicz, Andrea Montanari
We study random features approximations to these norms and show that, for $p>1$, the number of random features required to approximate the original learning problem is upper bounded by a polynomial in the sample size.
no code implementations • 16 Mar 2021 • Peter L. Bartlett, Andrea Montanari, Alexander Rakhlin
We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting.
no code implementations • 25 Feb 2021 • Song Mei, Theodor Misiakiewicz, Andrea Montanari
Certain neural network architectures -- for instance, convolutional networks -- are believed to owe their success to the fact that they exploit such invariance properties.
no code implementations • 26 Jan 2021 • Song Mei, Theodor Misiakiewicz, Andrea Montanari
We show that the test error of random features ridge regression is dominated by its approximation error and is larger than the error of KRR as long as $N\le n^{1-\delta}$ for some $\delta>0$.
no code implementations • 6 Nov 2020 • Alexander D'Amour, Katherine Heller, Dan Moldovan, Ben Adlam, Babak Alipanahi, Alex Beutel, Christina Chen, Jonathan Deaton, Jacob Eisenstein, Matthew D. Hoffman, Farhad Hormozdiari, Neil Houlsby, Shaobo Hou, Ghassen Jerfel, Alan Karthikesalingam, Mario Lucic, Yian Ma, Cory McLean, Diana Mincu, Akinori Mitani, Andrea Montanari, Zachary Nado, Vivek Natarajan, Christopher Nielson, Thomas F. Osborne, Rajiv Raman, Kim Ramasamy, Rory Sayres, Jessica Schrouff, Martin Seneviratne, Shannon Sequeira, Harini Suresh, Victor Veitch, Max Vladymyrov, Xuezhi Wang, Kellie Webster, Steve Yadlowsky, Taedong Yun, Xiaohua Zhai, D. Sculley
Predictors returned by underspecified pipelines are often treated as equivalent based on their training domain performance, but we show here that such predictors can behave very differently in deployment domains.
no code implementations • 27 Jul 2020 • Michael Celentano, Andrea Montanari, Yuting Wei
On the other hand, the Lasso estimator can be precisely characterized in the regime in which both $n$ and $p$ are large and $n/p$ is of order one.
no code implementations • 25 Jul 2020 • Andrea Montanari, Yiqiao Zhong
We assume that both the sample size $n$ and the dimension $d$ are large, and they are polynomially related.
1 code implementation • NeurIPS 2020 • Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari
Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance.
no code implementations • 28 Feb 2020 • Michael Celentano, Andrea Montanari, Yuchen Wu
These lower bounds are optimal in the sense that there exist algorithms whose estimation error matches the lower bounds up to asymptotically negligible terms.
no code implementations • 24 Jan 2020 • Kabir Aladin Chandrasekher, Ahmed El Alaoui, Andrea Montanari
We study high-dimensional regression with missing entries in the covariates.
1 code implementation • NeurIPS 2019 • Song Mei, Theodor Misiakiewicz, Behrooz Ghorbani, Andrea Montanari
We study the supervised learning problem under either of the following two models: (1) Feature vectors x_i are d-dimensional Gaussian and responses are y_i = f_*(x_i) for f_* an unknown quadratic function; (2) Feature vectors x_i are distributed as a mixture of two d-dimensional centered Gaussians, and y_i's are the corresponding class labels.
no code implementations • 5 Nov 2019 • Andrea Montanari, Feng Ruan, Youngtak Sohn, Jun Yan
They achieve this by learning nonlinear representations of the inputs that maps the data into linearly separable classes.
no code implementations • 14 Aug 2019 • Song Mei, Andrea Montanari
We compute the precise asymptotics of the test error, in the limit $N, n, d\to \infty$ with $N/d$ and $n/d$ fixed.
1 code implementation • 21 Jun 2019 • Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari
We study the supervised learning problem under either of the following two models: (1) Feature vectors ${\boldsymbol x}_i$ are $d$-dimensional Gaussians and responses are $y_i = f_*({\boldsymbol x}_i)$ for $f_*$ an unknown quadratic function; (2) Feature vectors ${\boldsymbol x}_i$ are distributed as a mixture of two $d$-dimensional centered Gaussians, and $y_i$'s are the corresponding class labels.
no code implementations • 27 Apr 2019 • Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari
Both these approaches can also be regarded as randomized approximations of kernel ridge regression (with respect to different kernels), and enjoy universal approximation properties when the number of neurons $N$ diverges, for a fixed dimension $d$.
no code implementations • 19 Mar 2019 • Trevor Hastie, Andrea Montanari, Saharon Rosset, Ryan J. Tibshirani
Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type.
no code implementations • 16 Feb 2019 • Song Mei, Theodor Misiakiewicz, Andrea Montanari
Earlier work shows that (under some regularity assumptions), the mean field description is accurate as soon as the number of hidden units is much larger than the dimension $D$.
no code implementations • 5 Jan 2019 • Adel Javanmard, Marco Mondelli, Andrea Montanari
We prove that, in the limit in which the number of neurons diverges, the evolution of gradient descent converges to a Wasserstein gradient flow in the space of probability distributions over $\Omega$.
no code implementations • NeurIPS 2018 • Yash Deshpande, Andrea Montanari, Elchanan Mossel, Subhabrata Sen
We provide the first information theoretic tight analysis for inference of latent community structure given a sparse graph along with high dimensional node covariates, correlated with the same latent communities.
no code implementations • 18 Apr 2018 • Song Mei, Andrea Montanari, Phan-Minh Nguyen
Does SGD converge to a global optimum of the risk or only to a local optimum?
no code implementations • 20 Feb 2018 • Marco Mondelli, Andrea Montanari
Our conclusion holds for a `natural data distribution', namely standard Gaussian feature vectors $\boldsymbol x$, and output distributed according to a two-layer neural network with random isotropic weights, and under a certain complexity-theoretic assumption on tensor decomposition.
no code implementations • 2 Feb 2018 • Behrooz Ghorbani, Hamid Javadi, Andrea Montanari
Namely, for certain regimes of the model parameters, variational inference outputs a non-trivial decomposition into topics.
no code implementations • 15 Nov 2017 • Gerard Ben Arous, Song Mei, Andrea Montanari, Mihai Nica
We compute the expected number of critical points and local maxima of this objective function and show that it is exponential in the dimensions $n$, and give exact formulas for the exponential growth rate.
1 code implementation • 6 Nov 2017 • Andrea Montanari, Ramji Venkataramanan
In this paper we present a practical algorithm that can achieve Bayes-optimal accuracy above the spectral threshold.
no code implementations • NeurIPS 2017 • Murat A. Erdogdu, Yash Deshpande, Andrea Montanari
We demonstrate that the resulting algorithm can solve problems with tens of thousands of variables within minutes, and outperforms BP and GBP on practical problems such as image denoising and Ising spin glasses.
no code implementations • 22 Aug 2017 • Stratis Ioannidis, Andrea Montanari
In a nutshell, we estimate the gradient of the regression function at a set of random points, and cluster the estimated gradients.
no code implementations • 20 Aug 2017 • Marco Mondelli, Andrea Montanari
In phase retrieval we want to recover an unknown signal $\boldsymbol x\in\mathbb C^d$ from $n$ quadratic measurements of the form $y_i = |\langle{\boldsymbol a}_i,{\boldsymbol x}\rangle|^2+w_i$ where $\boldsymbol a_i\in \mathbb C^d$ are known sensing vectors and $w_i$ is measurement noise.
no code implementations • 8 May 2017 • Hamid Javadi, Andrea Montanari
In this paper, we study an approach to NMF that can be traced back to the work of Cutler and Breiman (1994) and does not require the data to be separable, while providing a generally unique decomposition.
no code implementations • 25 Mar 2017 • Song Mei, Theodor Misiakiewicz, Andrea Montanari, Roberto I. Oliveira
In this paper we study the rank-constrained version of SDPs arising in MaxCut and in synchronization problems.
no code implementations • 23 Dec 2016 • Andrea Montanari, Nike Sun
In the tensor completion problem, one seeks to estimate a low-rank tensor based on a random sample of revealed entries.
no code implementations • 17 Oct 2016 • Zhou Fan, Andrea Montanari
Several probabilistic models from high-dimensional statistics and machine learning reveal an intriguing --and yet poorly understood-- dichotomy.
no code implementations • 22 Jul 2016 • Song Mei, Yu Bai, Andrea Montanari
We establish uniform convergence of the gradient and Hessian of the empirical risk to their population counterparts, as soon as the number of samples becomes larger than the number of unknown parameters (modulo logarithmic factors).
no code implementations • 30 Mar 2016 • Adel Javanmard, Andrea Montanari, Federico Ricci-Tersenghi
In this paper we study in detail several practical aspects of this new algorithm based on semidefinite programming for the detection of the planted partition.
no code implementations • 29 Mar 2016 • Adel Javanmard, Andrea Montanari
In this paper we consider the problem of controlling FDR in an "online manner".
no code implementations • 13 Mar 2016 • Andrea Montanari
A large number of problems in optimization, machine learning, signal processing can be effectively addressed by suitable semidefinite programming (SDP) relaxations.
no code implementations • NeurIPS 2015 • Andrea Montanari, Daniel Reichman, Ofer Zeitouni
We consider the following detection problem: given a realization of asymmetric matrix $X$ of dimension $n$, distinguish between the hypothesisthat all upper triangular variables are i. i. d.
no code implementations • NeurIPS 2015 • Murat A. Erdogdu, Andrea Montanari
In this regime, algorithms which utilize sub-sampling techniques are known to be effective.
no code implementations • 11 Aug 2015 • Adel Javanmard, Andrea Montanari
When the covariance is known, we prove that the debiased estimator is asymptotically Gaussian under the nearly optimal condition $s_0 = o(n/ (\log p)^2)$.
no code implementations • 23 Feb 2015 • Yash Deshpande, Andrea Montanari
Here we consider the degree-$4$ SOS relaxation, and study the construction of \cite{meka2013association} to prove that SOS fails unless $k\ge C\, n^{1/3}/\log n$.
1 code implementation • 22 Feb 2015 • Adel Javanmard, Andrea Montanari
Given a sequence of null hypotheses $\mathcal{H}(n) = (H_1,..., H_n)$, Benjamini and Hochberg \cite{benjamini1995controlling} introduced the false discovery rate (FDR) criterion, which is the expected proportion of false positives among rejected null hypotheses, and proposed a testing procedure that controls FDR below a pre-assigned significance level.
no code implementations • 19 Feb 2015 • Andrea Montanari
This can be regarded as a model for the problem of finding a tightly knitted community in a social network, or a cluster in a relational dataset.
no code implementations • NeurIPS 2014 • Yash Deshpande, Andrea Montanari, Emile Richard
We consider a simple model for noisy quadratic observation of an unknown vector $\bvz$.
no code implementations • NeurIPS 2014 • Andrea Montanari, Emile Richard
This is possibly related to a fundamental limitation of computationally tractable estimators for this problem.
no code implementations • 19 Sep 2014 • Eric W. Tramel, Santhosh Kumar, Andrei Giurgiu, Andrea Montanari
These notes review six lectures given by Prof. Andrea Montanari on the topic of statistical estimation for linear models.
no code implementations • 12 Sep 2014 • Andrea Montanari
Given a large dataset and an estimation task, it is common to pre-process the data by reducing them to a set of sufficient statistics.
no code implementations • 9 Aug 2014 • Amy Zhang, Nadia Fawaz, Stratis Ioannidis, Andrea Montanari
It is often the case that, within an online recommender system, multiple users share a common account.
no code implementations • 31 Mar 2014 • Stratis Ioannidis, Andrea Montanari, Udi Weinsberg, Smriti Bhagat, Nadia Fawaz, Nina Taft
Recent research has demonstrated that several private user attributes (such as political affiliation, sexual orientation, and gender) can be inferred from such data.
no code implementations • NeurIPS 2013 • Adel Javanmard, Andrea Montanari
This in turn implies that it is extremely challenging to quantify the `uncertainty' associated with a certain parameter estimate.
no code implementations • NeurIPS 2013 • Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari
In this context, we develop new estimators for the $\ell_2$ estimation risk $\|\hat{\theta}-\theta_0\|_2$ and the variance of the noise.
no code implementations • NeurIPS 2014 • Yash Deshpande, Andrea Montanari
In an influential paper, \cite{johnstone2004sparse} introduced a simple algorithm that estimates the support of the principal vectors $\mathbf{v}_1,\dots,\mathbf{v}_r$ by the largest entries in the diagonal of the empirical covariance.
no code implementations • 11 Nov 2013 • Yuekai Sun, Stratis Ioannidis, Andrea Montanari
We consider a discriminative learning (regression) problem, whereby the regression function is a convex combination of k linear classifiers.
no code implementations • 1 Nov 2013 • Adel Javanmard, Andrea Montanari
In the regime where the number of parameters $p$ is comparable to or exceeds the sample size $n$, a successful approach uses an $\ell_1$-penalized least squares estimator, known as Lasso.
no code implementations • NeurIPS 2013 • Adel Javanmard, Andrea Montanari
This in turn implies that it is extremely challenging to quantify the \emph{uncertainty} associated with a certain parameter estimate.
no code implementations • NeurIPS 2013 • Adel Javanmard, Andrea Montanari
In the high-dimensional regression model a response variable is linearly related to $p$ covariates, but the sample size $n$ is smaller than $p$.
no code implementations • 17 Jan 2013 • Adel Javanmard, Andrea Montanari
In this case we prove that a similar distributional characterization (termed `standard distributional limit') holds for $n$ much larger than $s_0(\log p)^2$.
no code implementations • 18 Dec 2012 • Morteza Ibrahimi, Andrea Montanari, George S Moore
We study a simple modification to the conventional time of flight mass spectrometry (TOFMS) where a \emph{variable} and (pseudo)-\emph{random} pulsing rate is used which allows for traces from different pulses to overlap.
no code implementations • NeurIPS 2010 • José Pereira, Morteza Ibrahimi, Andrea Montanari
We consider linear models for stochastic dynamics.
no code implementations • NeurIPS 2010 • Mohsen Bayati, José Pereira, Andrea Montanari
We consider the problem of learning a coefficient vector x0 from noisy linear observation y=Ax0+w.
1 code implementation • 8 Apr 2010 • David L. Donoho, Arian Maleki, Andrea Montanari
We develop formal expressions for the MSE of \hxl, and evaluate its worst-case formal noise sensitivity over all types of k-sparse signals.
Statistics Theory Information Theory Information Theory Statistics Theory
no code implementations • NeurIPS 2009 • Andrea Montanari, Jose A. Pereira
We consider the problem of learning the structure of Ising models (pairwise binary Markov random fields) from i. i. d.
1 code implementation • NeurIPS 2009 • Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh
Given a matrix M of low-rank, we consider the problem of reconstructing it from noisy observations of a small, random subset of its entries.
1 code implementation • 20 Jan 2009 • Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh
In the process of proving these statements, we obtain a generalization of a celebrated result by Friedman-Kahn-Szemeredi and Feige-Ofek on the spectrum of sparse random matrices.
no code implementations • 11 Sep 2007 • Andrea Montanari, Federico Ricci-Tersenghi, Guilhem Semerjian
Message passing algorithms have proved surprisingly successful in solving hard constraint satisfaction problems on sparse random graphs.