Search Results for author: Emmanuel Abbe

Found 28 papers, 4 papers with code

Transformers learn through gradual rank increase

no code implementations NeurIPS 2023 Enric Boix-Adsera, Etai Littwin, Emmanuel Abbe, Samy Bengio, Joshua Susskind

Our experiments support the theory and also show that phenomenon can occur in practice without the simplifying assumptions.

Incremental Learning

SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics

no code implementations21 Feb 2023 Emmanuel Abbe, Enric Boix-Adsera, Theodor Misiakiewicz

For $d$-dimensional uniform Boolean or isotropic Gaussian data, our main conjecture states that the time complexity to learn a function $f$ with low-dimensional support is $\tilde\Theta (d^{\max(\mathrm{Leap}(f), 2)})$.

Generalization on the Unseen, Logic Reasoning and Degree Curriculum

1 code implementation30 Jan 2023 Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Kevin Rizk

This paper considers the learning of logical (Boolean) functions with focus on the generalization on the unseen (GOTU) setting, a strong case of out-of-distribution generalization.

Out-of-Distribution Generalization

On the non-universality of deep learning: quantifying the cost of symmetry

no code implementations5 Aug 2022 Emmanuel Abbe, Enric Boix-Adsera

We prove limitations on what neural networks trained by noisy gradient descent (GD) can efficiently learn.

An initial alignment between neural network and target is needed for gradient descent to learn

no code implementations25 Feb 2022 Emmanuel Abbe, Elisabetta Cornacchia, Jan Hązła, Christopher Marquis

This paper introduces the notion of ``Initial Alignment'' (INAL) between a neural network at initialization and a target function.

The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks

no code implementations17 Feb 2022 Emmanuel Abbe, Enric Boix-Adsera, Theodor Misiakiewicz

It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parameterizations: neural networks in the linear regime, and neural networks with no structural constraints.

Binary perceptron: efficient algorithms can find solutions in a rare well-connected cluster

no code implementations4 Nov 2021 Emmanuel Abbe, Shuangping Li, Allan Sly

It was recently shown that almost all solutions in the symmetric binary perceptron are isolated, even at low constraint densities, suggesting that finding typical solutions is hard.

The staircase property: How hierarchical structure can guide deep learning

no code implementations NeurIPS 2021 Emmanuel Abbe, Enric Boix-Adsera, Matthew Brennan, Guy Bresler, Dheeraj Nagaraj

This paper identifies a structural property of data distributions that enables deep neural networks to learn hierarchically.

On the Power of Differentiable Learning versus PAC and SQ Learning

no code implementations NeurIPS 2021 Emmanuel Abbe, Pritish Kamath, Eran Malach, Colin Sandon, Nathan Srebro

With fine enough precision relative to minibatch size, namely when $b \rho$ is small enough, SGD can go beyond SQ learning and simulate any sample-based learning algorithm and thus its learning power is equivalent to that of PAC learning; this extends prior work that achieved this result for $b=1$.

PAC learning

Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels

no code implementations1 Mar 2021 Eran Malach, Pritish Kamath, Emmanuel Abbe, Nathan Srebro

Complementing this, we show that without these conditions, gradient descent can in fact learn with small error even when no kernel method, in particular using the tangent kernel, can achieve a non-trivial advantage over random guessing.

Proof of the Contiguity Conjecture and Lognormal Limit for the Symmetric Perceptron

no code implementations25 Feb 2021 Emmanuel Abbe, Shuangping Li, Allan Sly

We consider the symmetric binary perceptron model, a simple model of neural networks that has gathered significant attention in the statistical physics, information theory and probability theory communities, with recent connections made to the performance of learning algorithms in Baldassi et al. '15.

Stochastic block model entropy and broadcasting on trees with survey

no code implementations29 Jan 2021 Emmanuel Abbe, Elisabetta Cornacchia, Yuzhou Gu, Yury Polyanskiy

The limit of the entropy in the stochastic block model (SBM) has been characterized in the sparse regime for the special case of disassortative communities [COKPZ17] and for the classical case of assortative communities but in the dense regime [DAM16].

Probability Information Theory Information Theory

On the universality of deep learning

no code implementations NeurIPS 2020 Emmanuel Abbe, Colin Sandon

This paper shows that deep learning, i. e., neural networks trained by SGD, can learn in polytime any function class that can be learned in polytime by some algorithm, including parities.

Maximum Multiscale Entropy and Neural Network Regularization

no code implementations25 Jun 2020 Amir R. Asadi, Emmanuel Abbe

For different entropies and arbitrary scale transformations, it is shown that the distribution maximizing a multiscale entropy is characterized by a procedure which has an analogy to the renormalization group procedure in statistical physics.

Density Estimation

An $\ell_p$ theory of PCA and spectral clustering

no code implementations24 Jun 2020 Emmanuel Abbe, Jianqing Fan, Kaizheng Wang

Principal Component Analysis (PCA) is a powerful tool in statistics and machine learning.

Clustering Community Detection

Learning Sparse Graphons and the Generalized Kesten-Stigum Threshold

no code implementations13 Jun 2020 Emmanuel Abbe, Shuangping Li, Allan Sly

The problem of learning graphons has attracted considerable attention across several scientific communities, with significant progress over the recent years in sparser regimes.

Poly-time universality and limitations of deep learning

no code implementations7 Jan 2020 Emmanuel Abbe, Colin Sandon

Therefore deep learning provides a universal learning paradigm: it was known that the approximation and estimation errors could be controlled with poly-size neural nets, using ERM that is NP-hard; this new result shows that the optimization error can also be controlled with SGD in poly-time.

Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Nets

1 code implementation26 Jun 2019 Amir R. Asadi, Emmanuel Abbe

The bounds are obtained by introducing the notion of generated hierarchical coverings of neural nets and by using the technique of chaining mutual information introduced in Asadi et al. NeurIPS'18.

Provable limitations of deep learning

no code implementations16 Dec 2018 Emmanuel Abbe, Colin Sandon

As the success of deep learning reaches more grounds, one would like to also envision the potential limits of deep learning.

Community Detection

Chaining Mutual Information and Tightening Generalization Bounds

no code implementations NeurIPS 2018 Amir R. Asadi, Emmanuel Abbe, Sergio Verdú

Two important difficulties are (i) exploiting the dependencies between the hypotheses, (ii) exploiting the dependence between the algorithm's input and output.

Generalization Bounds

Communication-Computation Efficient Gradient Coding

no code implementations ICML 2018 Min Ye, Emmanuel Abbe

This paper develops coding techniques to reduce the running time of distributed learning tasks.

Nonbacktracking Bounds on the Influence in Independent Cascade Models

no code implementations NeurIPS 2017 Emmanuel Abbe, Sanjeev Kulkarni, Eun Jee Lee

This paper develops upper and lower bounds on the influence measure in a network, more precisely, the expected number of nodes that a seed set can influence in the independent cascade model.

Community Detection

Community Detection and Stochastic Block Models

no code implementations29 Mar 2017 Emmanuel Abbe

This monograph surveys the recent developments that establish the fundamental limits for community detection in the SBM, both with respect to information-theoretic and computational tradeoffs, and for various recovery requirements such as exact, partial and weak recovery.

Clustering Community Detection +1

Achieving the KS threshold in the general stochastic block model with linearized acyclic belief propagation

no code implementations NeurIPS 2016 Emmanuel Abbe, Colin Sandon

The stochastic block model (SBM) has long been studied in machine learning and network science as a canonical model for clustering and community detection.

Clustering Community Detection +1

Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic BP, and the information-computation gap

no code implementations30 Dec 2015 Emmanuel Abbe, Colin Sandon

In a paper that initiated the modern study of the stochastic block model, Decelle et al., backed by Mossel et al., made the following conjecture: Denote by $k$ the number of balanced communities, $a/n$ the probability of connecting inside communities and $b/n$ across, and set $\mathrm{SNR}=(a-b)^2/(k(a+(k-1)b)$; for any $k \geq 2$, it is possible to detect communities efficiently whenever $\mathrm{SNR}>1$ (the KS threshold), whereas for $k\geq 4$, it is possible to detect communities information-theoretically for some $\mathrm{SNR}<1$.

Clustering Stochastic Block Model

Recovering communities in the general stochastic block model without knowing the parameters

no code implementations NeurIPS 2015 Emmanuel Abbe, Colin Sandon

Most recent developments on the stochastic block model (SBM) rely on the knowledge of the model parameters, or at least on the number of communities.

Stochastic Block Model

Cannot find the paper you are looking for? You can Submit a new open access paper.