1 code implementation • 14 Mar 2024 • Enric Boix-Adsera
Distillation is the task of replacing a complicated machine learning model with a simpler model that approximates the original [BCNM06, HVD15].
1 code implementation • 13 Nov 2023 • Rimon Melamed, Lucas H. McCabe, Tanay Wakhare, Yejin Kim, H. Howie Huang, Enric Boix-Adsera
We discover that many natural-language prompts can be replaced by corresponding prompts that are unintelligible to humans but that provably elicit similar behavior in language models.
1 code implementation • 15 Oct 2023 • Enric Boix-Adsera, Omid Saremi, Emmanuel Abbe, Samy Bengio, Etai Littwin, Joshua Susskind
We investigate the capabilities of transformer models on relational reasoning tasks.
no code implementations • NeurIPS 2023 • Enric Boix-Adsera, Etai Littwin, Emmanuel Abbe, Samy Bengio, Joshua Susskind
Our experiments support the theory and also show that phenomenon can occur in practice without the simplifying assumptions.
no code implementations • 22 May 2023 • Enric Boix-Adsera, Etai Littwin
We study when the neural tangent kernel (NTK) approximation is valid for training a model with the square loss.
no code implementations • 21 Feb 2023 • Emmanuel Abbe, Enric Boix-Adsera, Theodor Misiakiewicz
For $d$-dimensional uniform Boolean or isotropic Gaussian data, our main conjecture states that the time complexity to learn a function $f$ with low-dimensional support is $\tilde\Theta (d^{\max(\mathrm{Leap}(f), 2)})$.
1 code implementation • 12 Oct 2022 • Enric Boix-Adsera, Hannah Lawrence, George Stepaniants, Philippe Rigollet
Comparing the representations learned by different neural networks has recently emerged as a key tool to understand various architectures and ultimately optimize them.
no code implementations • 5 Aug 2022 • Emmanuel Abbe, Enric Boix-Adsera
We prove limitations on what neural networks trained by noisy gradient descent (GD) can efficiently learn.
no code implementations • 17 Feb 2022 • Emmanuel Abbe, Enric Boix-Adsera, Theodor Misiakiewicz
It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parameterizations: neural networks in the linear regime, and neural networks with no structural constraints.
no code implementations • NeurIPS 2021 • Emmanuel Abbe, Enric Boix-Adsera, Matthew Brennan, Guy Bresler, Dheeraj Nagaraj
This paper identifies a structural property of data distributions that enables deep neural networks to learn hierarchically.
no code implementations • 7 Jun 2021 • Enric Boix-Adsera, Guy Bresler, Frederic Koehler
In this paper, we introduce a new algorithm that carefully combines elements of the Chow-Liu algorithm with tree metric reconstruction methods to efficiently and optimally learn tree Ising models under a prediction-centric loss.
no code implementations • 4 Jan 2021 • Jason M. Altschuler, Enric Boix-Adsera
Moreover, our hardness results for computing Wasserstein barycenters extend to approximate computation, to seemingly simple cases of the problem, and to averaging probability distributions in other Optimal Transport metrics.
no code implementations • 10 Dec 2020 • Jason M. Altschuler, Enric Boix-Adsera
We demonstrate this toolkit by using it to establish the intractability of a number of MOT problems studied in the literature that have resisted previous algorithmic efforts.
1 code implementation • 7 Aug 2020 • Jason M. Altschuler, Enric Boix-Adsera
We illustrate this ease-of-use by developing poly(n, k) time algorithms for three general classes of MOT cost structures: (1) graphical structure; (2) set-optimization structure; and (3) low-rank plus sparse structure.
no code implementations • 14 Jun 2020 • Jason M. Altschuler, Enric Boix-Adsera
Computing Wasserstein barycenters is a fundamental geometric problem with widespread applications in machine learning, statistics, and computer graphics.