Search Results for author: Marco Mondelli

Found 30 papers, 4 papers with code

Average gradient outer product as a mechanism for deep neural collapse

no code implementations21 Feb 2024 Daniel Beaglehole, Peter Súkeník, Marco Mondelli, Mikhail Belkin

In this work, we provide substantial evidence that DNC formation occurs primarily through deep feature learning with the average gradient outer product (AGOP).

Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth

no code implementations7 Feb 2024 Kevin Kögler, Alexander Shevchenko, Hamed Hassani, Marco Mondelli

For the prototypical case of the 1-bit compression of sparse Gaussian data, we prove that gradient descent converges to a solution that completely disregards the sparse structure of the input.

Data Compression Denoising

Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features

no code implementations5 Feb 2024 Simone Bombari, Marco Mondelli

Unveiling the reasons behind the exceptional success of transformers requires a better understanding of why attention layers are suitable for NLP tasks.

Generalization Bounds Sentence +1

Spectral Estimators for Structured Generalized Linear Models via Approximate Message Passing

no code implementations28 Aug 2023 Yihan Zhang, Hong Chang Ji, Ramji Venkataramanan, Marco Mondelli

Our methodology is general, and opens the way to the precise characterization of spiked matrices and of the corresponding spectral methods in a variety of settings.

Improved Convergence of Score-Based Diffusion Models via Prediction-Correction

no code implementations23 May 2023 Francesco Pedrotti, Jan Maas, Marco Mondelli

Our key technical contribution is to provide convergence guarantees which require to run the forward process only for a fixed finite time $T_1$.

Stability, Generalization and Privacy: Precise Analysis for Random and NTK Features

no code implementations20 May 2023 Simone Bombari, Marco Mondelli

Deep learning models can be vulnerable to recovery attacks, raising privacy concerns to users, and widespread algorithms such as empirical risk minimization (ERM) often do not directly enforce safety guarantees.

Learning Theory

Mismatched estimation of non-symmetric rank-one matrices corrupted by structured noise

no code implementations7 Feb 2023 Teng Fu, Yuhao Liu, Jean Barbier, Marco Mondelli, Shansuo Liang, Tianqi Hou

We study the performance of a Bayesian statistician who estimates a rank-one signal corrupted by non-symmetric rotationally invariant noise with a generic distribution of singular values.

Beyond the Universal Law of Robustness: Sharper Laws for Random Features and Neural Tangent Kernels

1 code implementation3 Feb 2023 Simone Bombari, Shayan Kiyani, Marco Mondelli

However, this "universal" law provides only a necessary condition for robustness, and it is unable to discriminate between models.

Approximate Message Passing for Multi-Layer Estimation in Rotationally Invariant Models

1 code implementation3 Dec 2022 Yizhou Xu, Tianqi Hou, Shansuo Liang, Marco Mondelli

We consider the problem of reconstructing the signal and the hidden variables from observations coming from a multi-layer network with rotationally invariant weight matrices.

Precise Asymptotics for Spectral Methods in Mixed Generalized Linear Models

no code implementations21 Nov 2022 Yihan Zhang, Marco Mondelli, Ramji Venkataramanan

In a mixed generalized linear model, the objective is to learn multiple signals from unlabeled observations: each sample comes from exactly one signal, but it is not known which one.

Retrieval

Finite Sample Identification of Wide Shallow Neural Networks with Biases

no code implementations8 Nov 2022 Massimo Fornasier, Timo Klock, Marco Mondelli, Michael Rauchensteiner

Artificial neural networks are functions depending on a finite number of parameters typically encoded as weights and biases.

Mean-field analysis for heavy ball methods: Dropout-stability, connectivity, and global convergence

no code implementations13 Oct 2022 Diyuan Wu, Vyacheslav Kungurtsev, Marco Mondelli

In this paper, we focus on neural networks with two and three layers and provide a rigorous understanding of the properties of the solutions found by SHB: \emph{(i)} stability after dropping out part of the neurons, \emph{(ii)} connectivity along a low-loss path, and \emph{(iii)} convergence to the global optimum.

Bayes-optimal limits in structured PCA, and how to reach them

1 code implementation3 Oct 2022 Jean Barbier, Francesco Camilli, Marco Mondelli, Manuel Saenz

To answer this, we study the paradigmatic spiked matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise.

The price of ignorance: how much does it cost to forget noise structure in low-rank matrix estimation?

no code implementations20 May 2022 Jean Barbier, Tianqi Hou, Marco Mondelli, Manuel Sáenz

We consider the problem of estimating a rank-1 signal corrupted by structured rotationally invariant noise, and address the following question: how well do inference algorithms perform when the noise statistics is unknown and hence Gaussian noise is assumed?

Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization

no code implementations20 May 2022 Simone Bombari, Mohammad Hossein Amani, Marco Mondelli

The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks.

Memorization Open-Ended Question Answering

Sharp asymptotics on the compression of two-layer neural networks

no code implementations17 May 2022 Mohammad Hossein Amani, Simone Bombari, Marco Mondelli, Rattana Pukdee, Stefano Rini

In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M<N nodes.

Vocal Bursts Valence Prediction

Estimation in Rotationally Invariant Generalized Linear Models via Approximate Message Passing

no code implementations8 Dec 2021 Ramji Venkataramanan, Kevin Kögler, Marco Mondelli

We consider the problem of signal estimation in generalized linear models defined via rotationally invariant design matrices.

Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks

no code implementations3 Nov 2021 Alexander Shevchenko, Vyacheslav Kungurtsev, Marco Mondelli

Understanding the properties of neural networks trained via stochastic gradient descent (SGD) is at the heart of the theory of deep learning.

PCA Initialization for Approximate Message Passing in Rotationally Invariant Models

no code implementations NeurIPS 2021 Marco Mondelli, Ramji Venkataramanan

However, the existing analysis of AMP requires an initialization that is both correlated with the signal and independent of the noise, which is often unrealistic in practice.

When Are Solutions Connected in Deep Networks?

1 code implementation NeurIPS 2021 Quynh Nguyen, Pierre Brechet, Marco Mondelli

More specifically, we show that: (i) under generic assumptions on the features of intermediate layers, it suffices that the last two hidden layers have order of $\sqrt{N}$ neurons, and (ii) if subsets of features at each layer are linearly separable, then no over-parameterization is needed to show the connectivity.

Parallelism versus Latency in Simplified Successive-Cancellation Decoding of Polar Codes

no code implementations24 Dec 2020 Seyyed Ali Hashemi, Marco Mondelli, Arman Fazeli, Alexander Vardy, John Cioffi, Andrea Goldsmith

In particular, when the number of processing elements $P$ that can perform SSC decoding operations in parallel is limited, as is the case in practice, the latency of SSC decoding is $O\left(N^{1-1/\mu}+\frac{N}{P}\log_2\log_2\frac{N}{P}\right)$, where $N$ is the block length of the code and $\mu$ is the scaling exponent of the channel.

Information Theory Information Theory

Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks

no code implementations21 Dec 2020 Quynh Nguyen, Marco Mondelli, Guido Montufar

In this paper, we provide tight bounds on the smallest eigenvalue of NTK matrices for deep ReLU nets, both in the limiting case of infinite widths and for finite widths.

Memorization

Approximate Message Passing with Spectral Initialization for Generalized Linear Models

no code implementations7 Oct 2020 Marco Mondelli, Ramji Venkataramanan

We consider the problem of estimating a signal from measurements obtained via a generalized linear model.

Retrieval

Optimal Combination of Linear and Spectral Estimators for Generalized Linear Models

no code implementations7 Aug 2020 Marco Mondelli, Christos Thrampoulidis, Ramji Venkataramanan

This allows us to compute the Bayes-optimal combination of $\hat{\boldsymbol x}^{\rm L}$ and $\hat{\boldsymbol x}^{\rm s}$, given the limiting distribution of the signal $\boldsymbol x$.

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology

no code implementations NeurIPS 2020 Quynh Nguyen, Marco Mondelli

Recent works have shown that gradient descent can find a global minimum for over-parameterized neural networks where the widths of all the hidden layers scale polynomially with $N$ ($N$ being the number of training samples).

Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks

no code implementations ICML 2020 Alexander Shevchenko, Marco Mondelli

In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately connected and thus more favorable to optimization.

Analysis of a Two-Layer Neural Network via Displacement Convexity

no code implementations5 Jan 2019 Adel Javanmard, Marco Mondelli, Andrea Montanari

We prove that, in the limit in which the number of neurons diverges, the evolution of gradient descent converges to a Wasserstein gradient flow in the space of probability distributions over $\Omega$.

Vocal Bursts Valence Prediction

On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition

no code implementations20 Feb 2018 Marco Mondelli, Andrea Montanari

Our conclusion holds for a `natural data distribution', namely standard Gaussian feature vectors $\boldsymbol x$, and output distributed according to a two-layer neural network with random isotropic weights, and under a certain complexity-theoretic assumption on tensor decomposition.

Tensor Decomposition

Fundamental Limits of Weak Recovery with Applications to Phase Retrieval

no code implementations20 Aug 2017 Marco Mondelli, Andrea Montanari

In phase retrieval we want to recover an unknown signal $\boldsymbol x\in\mathbb C^d$ from $n$ quadratic measurements of the form $y_i = |\langle{\boldsymbol a}_i,{\boldsymbol x}\rangle|^2+w_i$ where $\boldsymbol a_i\in \mathbb C^d$ are known sensing vectors and $w_i$ is measurement noise.

Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.