Search Results for author: Emmanuel Abbe

Found 28 papers, 4 papers with code

When can transformers reason with abstract symbols?

1 code implementation • 15 Oct 2023 • Enric Boix-Adsera, Omid Saremi, Emmanuel Abbe, Samy Bengio, Etai Littwin, Joshua Susskind

We investigate the capabilities of transformer models on relational reasoning tasks.

Relational Reasoning

Paper
Code

Transformers learn through gradual rank increase

no code implementations • NeurIPS 2023 • Enric Boix-Adsera, Etai Littwin, Emmanuel Abbe, Samy Bengio, Joshua Susskind

Our experiments support the theory and also show that phenomenon can occur in practice without the simplifying assumptions.

Incremental Learning

Paper
Add Code

SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics

no code implementations • 21 Feb 2023 • Emmanuel Abbe, Enric Boix-Adsera, Theodor Misiakiewicz

For $d$-dimensional uniform Boolean or isotropic Gaussian data, our main conjecture states that the time complexity to learn a function $f$ with low-dimensional support is $\tilde\Theta (d^{\max(\mathrm{Leap}(f), 2)})$.

Paper
Add Code

Generalization on the Unseen, Logic Reasoning and Degree Curriculum

1 code implementation • 30 Jan 2023 • Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Kevin Rizk

This paper considers the learning of logical (Boolean) functions with focus on the generalization on the unseen (GOTU) setting, a strong case of out-of-distribution generalization.

Out-of-Distribution Generalization

Paper
Code

On the non-universality of deep learning: quantifying the cost of symmetry

no code implementations • 5 Aug 2022 • Emmanuel Abbe, Enric Boix-Adsera

We prove limitations on what neural networks trained by noisy gradient descent (GD) can efficiently learn.

Paper
Add Code

Learning to Reason with Neural Networks: Generalization, Unseen Data and Boolean Measures

1 code implementation • 26 May 2022 • Emmanuel Abbe, Samy Bengio, Elisabetta Cornacchia, Jon Kleinberg, Aryo Lotfi, Maithra Raghu, Chiyuan Zhang

More generally, the paper considers the learning of logical functions with gradient descent (GD) on neural networks.

Retrieval

Paper
Code

An initial alignment between neural network and target is needed for gradient descent to learn

no code implementations • 25 Feb 2022 • Emmanuel Abbe, Elisabetta Cornacchia, Jan Hązła, Christopher Marquis

This paper introduces the notion of ``Initial Alignment'' (INAL) between a neural network at initialization and a target function.

Paper
Add Code

The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks

no code implementations • 17 Feb 2022 • Emmanuel Abbe, Enric Boix-Adsera, Theodor Misiakiewicz

It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parameterizations: neural networks in the linear regime, and neural networks with no structural constraints.

Paper
Add Code

Binary perceptron: efficient algorithms can find solutions in a rare well-connected cluster

no code implementations • 4 Nov 2021 • Emmanuel Abbe, Shuangping Li, Allan Sly

It was recently shown that almost all solutions in the symmetric binary perceptron are isolated, even at low constraint densities, suggesting that finding typical solutions is hard.

Paper
Add Code

The staircase property: How hierarchical structure can guide deep learning

no code implementations • NeurIPS 2021 • Emmanuel Abbe, Enric Boix-Adsera, Matthew Brennan, Guy Bresler, Dheeraj Nagaraj

This paper identifies a structural property of data distributions that enables deep neural networks to learn hierarchically.

Paper
Add Code

On the Power of Differentiable Learning versus PAC and SQ Learning

no code implementations • NeurIPS 2021 • Emmanuel Abbe, Pritish Kamath, Eran Malach, Colin Sandon, Nathan Srebro

With fine enough precision relative to minibatch size, namely when $b \rho$ is small enough, SGD can go beyond SQ learning and simulate any sample-based learning algorithm and thus its learning power is equivalent to that of PAC learning; this extends prior work that achieved this result for $b=1$.

PAC learning

Paper
Add Code

Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels

no code implementations • 1 Mar 2021 • Eran Malach, Pritish Kamath, Emmanuel Abbe, Nathan Srebro

Complementing this, we show that without these conditions, gradient descent can in fact learn with small error even when no kernel method, in particular using the tangent kernel, can achieve a non-trivial advantage over random guessing.

Paper
Add Code

Proof of the Contiguity Conjecture and Lognormal Limit for the Symmetric Perceptron

no code implementations • 25 Feb 2021 • Emmanuel Abbe, Shuangping Li, Allan Sly

We consider the symmetric binary perceptron model, a simple model of neural networks that has gathered significant attention in the statistical physics, information theory and probability theory communities, with recent connections made to the performance of learning algorithms in Baldassi et al. '15.

Paper
Add Code

Stochastic block model entropy and broadcasting on trees with survey

no code implementations • 29 Jan 2021 • Emmanuel Abbe, Elisabetta Cornacchia, Yuzhou Gu, Yury Polyanskiy

The limit of the entropy in the stochastic block model (SBM) has been characterized in the sparse regime for the special case of disassortative communities [COKPZ17] and for the classical case of assortative communities but in the dense regime [DAM16].

Probability Information Theory Information Theory

Paper
Add Code

On the universality of deep learning

no code implementations • NeurIPS 2020 • Emmanuel Abbe, Colin Sandon

This paper shows that deep learning, i. e., neural networks trained by SGD, can learn in polytime any function class that can be learned in polytime by some algorithm, including parities.

Paper
Add Code

Maximum Multiscale Entropy and Neural Network Regularization

no code implementations • 25 Jun 2020 • Amir R. Asadi, Emmanuel Abbe

For different entropies and arbitrary scale transformations, it is shown that the distribution maximizing a multiscale entropy is characterized by a procedure which has an analogy to the renormalization group procedure in statistical physics.

Density Estimation

Paper
Add Code

An $\ell_p$ theory of PCA and spectral clustering

no code implementations • 24 Jun 2020 • Emmanuel Abbe, Jianqing Fan, Kaizheng Wang

Principal Component Analysis (PCA) is a powerful tool in statistics and machine learning.

Clustering Community Detection

Paper
Add Code

Learning Sparse Graphons and the Generalized Kesten-Stigum Threshold

no code implementations • 13 Jun 2020 • Emmanuel Abbe, Shuangping Li, Allan Sly

The problem of learning graphons has attracted considerable attention across several scientific communities, with significant progress over the recent years in sparser regimes.

Paper
Add Code

Poly-time universality and limitations of deep learning

no code implementations • 7 Jan 2020 • Emmanuel Abbe, Colin Sandon

Therefore deep learning provides a universal learning paradigm: it was known that the approximation and estimation errors could be controlled with poly-size neural nets, using ERM that is NP-hard; this new result shows that the optimization error can also be controlled with SGD in poly-time.

Paper
Add Code

Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Nets

1 code implementation • 26 Jun 2019 • Amir R. Asadi, Emmanuel Abbe

The bounds are obtained by introducing the notion of generated hierarchical coverings of neural nets and by using the technique of chaining mutual information introduced in Asadi et al. NeurIPS'18.

Paper
Code

Provable limitations of deep learning

no code implementations • 16 Dec 2018 • Emmanuel Abbe, Colin Sandon

As the success of deep learning reaches more grounds, one would like to also envision the potential limits of deep learning.

Community Detection

Paper
Add Code

Chaining Mutual Information and Tightening Generalization Bounds

no code implementations • NeurIPS 2018 • Amir R. Asadi, Emmanuel Abbe, Sergio Verdú

Two important difficulties are (i) exploiting the dependencies between the hypotheses, (ii) exploiting the dependence between the algorithm's input and output.

Generalization Bounds

Paper
Add Code

Communication-Computation Efficient Gradient Coding

no code implementations • ICML 2018 • Min Ye, Emmanuel Abbe

This paper develops coding techniques to reduce the running time of distributed learning tasks.

Paper
Add Code

Nonbacktracking Bounds on the Influence in Independent Cascade Models

no code implementations • NeurIPS 2017 • Emmanuel Abbe, Sanjeev Kulkarni, Eun Jee Lee

This paper develops upper and lower bounds on the influence measure in a network, more precisely, the expected number of nodes that a seed set can influence in the independent cascade model.

Community Detection

Paper
Add Code

Community Detection and Stochastic Block Models

no code implementations • 29 Mar 2017 • Emmanuel Abbe

This monograph surveys the recent developments that establish the fundamental limits for community detection in the SBM, both with respect to information-theoretic and computational tradeoffs, and for various recovery requirements such as exact, partial and weak recovery.

Clustering Community Detection +1

Paper
Add Code

Achieving the KS threshold in the general stochastic block model with linearized acyclic belief propagation

no code implementations • NeurIPS 2016 • Emmanuel Abbe, Colin Sandon

The stochastic block model (SBM) has long been studied in machine learning and network science as a canonical model for clustering and community detection.

Clustering Community Detection +1

Paper
Add Code

Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic BP, and the information-computation gap

no code implementations • 30 Dec 2015 • Emmanuel Abbe, Colin Sandon

In a paper that initiated the modern study of the stochastic block model, Decelle et al., backed by Mossel et al., made the following conjecture: Denote by $k$ the number of balanced communities, $a/n$ the probability of connecting inside communities and $b/n$ across, and set $\mathrm{SNR}=(a-b)^2/(k(a+(k-1)b)$; for any $k \geq 2$, it is possible to detect communities efficiently whenever $\mathrm{SNR}>1$ (the KS threshold), whereas for $k\geq 4$, it is possible to detect communities information-theoretically for some $\mathrm{SNR}<1$.

Clustering Stochastic Block Model

Paper
Add Code

Recovering communities in the general stochastic block model without knowing the parameters

no code implementations • NeurIPS 2015 • Emmanuel Abbe, Colin Sandon

Most recent developments on the stochastic block model (SBM) rely on the knowledge of the model parameters, or at least on the number of communities.

Stochastic Block Model

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.