Search Results for author: Enric Boix-Adsera

Found 15 papers, 5 papers with code

Towards a theory of model distillation

1 code implementation14 Mar 2024 Enric Boix-Adsera

Distillation is the task of replacing a complicated machine learning model with a simpler model that approximates the original [BCNM06, HVD15].

PAC learning

PROPANE: Prompt design as an inverse problem

1 code implementation13 Nov 2023 Rimon Melamed, Lucas H. McCabe, Tanay Wakhare, Yejin Kim, H. Howie Huang, Enric Boix-Adsera

Carefully-designed prompts are key to inducing desired behavior in Large Language Models (LLMs).

Transformers learn through gradual rank increase

no code implementations NeurIPS 2023 Enric Boix-Adsera, Etai Littwin, Emmanuel Abbe, Samy Bengio, Joshua Susskind

Our experiments support the theory and also show that phenomenon can occur in practice without the simplifying assumptions.

Incremental Learning

Tight conditions for when the NTK approximation is valid

no code implementations22 May 2023 Enric Boix-Adsera, Etai Littwin

We study when the neural tangent kernel (NTK) approximation is valid for training a model with the square loss.

valid

SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics

no code implementations21 Feb 2023 Emmanuel Abbe, Enric Boix-Adsera, Theodor Misiakiewicz

For $d$-dimensional uniform Boolean or isotropic Gaussian data, our main conjecture states that the time complexity to learn a function $f$ with low-dimensional support is $\tilde\Theta (d^{\max(\mathrm{Leap}(f), 2)})$.

GULP: a prediction-based metric between representations

1 code implementation12 Oct 2022 Enric Boix-Adsera, Hannah Lawrence, George Stepaniants, Philippe Rigollet

Comparing the representations learned by different neural networks has recently emerged as a key tool to understand various architectures and ultimately optimize them.

On the non-universality of deep learning: quantifying the cost of symmetry

no code implementations5 Aug 2022 Emmanuel Abbe, Enric Boix-Adsera

We prove limitations on what neural networks trained by noisy gradient descent (GD) can efficiently learn.

The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks

no code implementations17 Feb 2022 Emmanuel Abbe, Enric Boix-Adsera, Theodor Misiakiewicz

It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parameterizations: neural networks in the linear regime, and neural networks with no structural constraints.

The staircase property: How hierarchical structure can guide deep learning

no code implementations NeurIPS 2021 Emmanuel Abbe, Enric Boix-Adsera, Matthew Brennan, Guy Bresler, Dheeraj Nagaraj

This paper identifies a structural property of data distributions that enables deep neural networks to learn hierarchically.

Chow-Liu++: Optimal Prediction-Centric Learning of Tree Ising Models

no code implementations7 Jun 2021 Enric Boix-Adsera, Guy Bresler, Frederic Koehler

In this paper, we introduce a new algorithm that carefully combines elements of the Chow-Liu algorithm with tree metric reconstruction methods to efficiently and optimally learn tree Ising models under a prediction-centric loss.

Wasserstein barycenters are NP-hard to compute

no code implementations4 Jan 2021 Jason M. Altschuler, Enric Boix-Adsera

Moreover, our hardness results for computing Wasserstein barycenters extend to approximate computation, to seemingly simple cases of the problem, and to averaging probability distributions in other Optimal Transport metrics.

Open-Ended Question Answering

Hardness results for Multimarginal Optimal Transport problems

no code implementations10 Dec 2020 Jason M. Altschuler, Enric Boix-Adsera

We demonstrate this toolkit by using it to establish the intractability of a number of MOT problems studied in the literature that have resisted previous algorithmic efforts.

Polynomial-time algorithms for Multimarginal Optimal Transport problems with structure

1 code implementation7 Aug 2020 Jason M. Altschuler, Enric Boix-Adsera

We illustrate this ease-of-use by developing poly(n, k) time algorithms for three general classes of MOT cost structures: (1) graphical structure; (2) set-optimization structure; and (3) low-rank plus sparse structure.

BIG-bench Machine Learning

Wasserstein barycenters can be computed in polynomial time in fixed dimension

no code implementations14 Jun 2020 Jason M. Altschuler, Enric Boix-Adsera

Computing Wasserstein barycenters is a fundamental geometric problem with widespread applications in machine learning, statistics, and computer graphics.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.