no code implementations • 9 Dec 2024 • Eshaan Nichani, Jason D. Lee, Alberto Bietti
We begin by proving that the storage capacities of both linear and MLP associative memories scale linearly with parameter count.
no code implementations • 5 Jun 2024 • Lei Chen, Joan Bruna, Alberto Bietti
In addition to the ability to generate fluent text in various languages, large language models have been successful at tasks that involve basic forms of logical "reasoning" over their context.
no code implementations • 30 May 2024 • Siavash Golkar, Alberto Bietti, Mariel Pettee, Michael Eickenberg, Miles Cranmer, Keiya Hirashima, Geraud Krawezik, Nicholas Lourie, Michael McCabe, Rudy Morel, Ruben Ohana, Liam Holden Parker, Bruno Régaldo-Saint Blancard, Kyunghyun Cho, Shirley Ho
Transformers have revolutionized machine learning across diverse domains, yet understanding their behavior remains crucial, particularly in high-stakes applications.
no code implementations • 5 Mar 2024 • Aaron Mishkin, Alberto Bietti, Robert M. Gower
We study level set teleportation, an optimization sub-routine which seeks to accelerate gradient methods by maximizing the gradient norm on a level-set of the objective function.
no code implementations • 29 Feb 2024 • Frederik Kunstner, Robin Yadav, Alan Milligan, Mark Schmidt, Alberto Bietti
We show that a key factor in this performance gap is the heavy-tailed class imbalance found in language tasks.
no code implementations • 28 Feb 2024 • Vivien Cabannes, Berfin Simsek, Alberto Bietti
This work focuses on the training dynamics of one associative memory module storing outer products of token embeddings.
no code implementations • 30 Oct 2023 • Alberto Bietti, Joan Bruna, Loucas Pillaud-Vivien
We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data.
no code implementations • 4 Oct 2023 • Vivien Cabannes, Elvis Dohmatob, Alberto Bietti
Learning arguably involves the discovery and memorization of abstract rules.
1 code implementation • 4 Oct 2023 • Liam Parker, Francois Lanusse, Siavash Golkar, Leopoldo Sarra, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Geraud Krawezik, Michael McCabe, Ruben Ohana, Mariel Pettee, Bruno Regaldo-Saint Blancard, Tiberiu Tesileanu, Kyunghyun Cho, Shirley Ho
These embeddings can then be used - without any model fine-tuning - for a variety of downstream tasks including (1) accurate in-modality and cross-modality semantic similarity search, (2) photometric redshift estimation, (3) galaxy property estimation from both images and spectra, and (4) morphology classification.
2 code implementations • 4 Oct 2023 • Siavash Golkar, Mariel Pettee, Michael Eickenberg, Alberto Bietti, Miles Cranmer, Geraud Krawezik, Francois Lanusse, Michael McCabe, Ruben Ohana, Liam Parker, Bruno Régaldo-Saint Blancard, Tiberiu Tesileanu, Kyunghyun Cho, Shirley Ho
Due in part to their discontinuous and discrete default encodings for numbers, Large Language Models (LLMs) have not yet been commonly used to process numerically-dense scientific datasets.
1 code implementation • 4 Oct 2023 • Michael McCabe, Bruno Régaldo-Saint Blancard, Liam Holden Parker, Ruben Ohana, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Siavash Golkar, Geraud Krawezik, Francois Lanusse, Mariel Pettee, Tiberiu Tesileanu, Kyunghyun Cho, Shirley Ho
We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling of spatiotemporal systems with transformers.
no code implementations • 6 Feb 2023 • Vivien Cabannes, Bobak T. Kiani, Randall Balestriero, Yann Lecun, Alberto Bietti
Self-supervised learning (SSL) has emerged as a powerful framework to learn representations from raw data without supervision.
no code implementations • 7 Nov 2022 • Vivien Cabannes, Alberto Bietti, Randall Balestriero
Unsupervised representation learning aims at describing raw data efficiently to solve various downstream tasks.
no code implementations • 27 Oct 2022 • Alberto Bietti, Joan Bruna, Clayton Sanford, Min Jae Song
Single-index models are a class of functions given by an unknown univariate ``link'' function applied to an unknown one-dimensional projection of the input.
1 code implementation • 2 Jun 2022 • David Brandfonbrener, Alberto Bietti, Jacob Buckman, Romain Laroche, Joan Bruna
Several recent works have proposed a class of algorithms for the offline reinforcement learning (RL) problem that we will refer to as return-conditioned supervised learning (RCSL).
no code implementations • 22 Mar 2022 • Elvis Dohmatob, Alberto Bietti
To better understand these factors, we provide a precise study of the adversarial robustness in different scenarios, from initialization to the end of training in different regimes, as well as intermediate scenarios, where initialization still plays a role due to "lazy" training.
1 code implementation • 11 Feb 2022 • Houssam Zenati, Alberto Bietti, Eustache Diemert, Julien Mairal, Matthieu Martin, Pierre Gaillard
While standard methods require a O(CT^3) complexity where T is the horizon and the constant C is related to optimizing the UCB rule, we propose an efficient contextual algorithm for large-scale problems.
1 code implementation • 10 Feb 2022 • Alberto Bietti, Chen-Yu Wei, Miroslav Dudík, John Langford, Zhiwei Steven Wu
Large-scale machine learning systems often involve data distributed across a collection of users.
no code implementations • NeurIPS 2021 • Alberto Bietti, Luca Venturi, Joan Bruna
Many supervised learning problems involve high-dimensional data such as images, text, or graphs.
no code implementations • 11 Jul 2021 • Carles Domingo-Enrich, Alberto Bietti, Marylou Gabrié, Joan Bruna, Eric Vanden-Eijnden
In the feature-learning regime, this dual formulation justifies using a two time-scale gradient ascent-descent (GDA) training algorithm in which one updates concurrently the particles in the sample space and the neurons in the parameter space of the energy.
no code implementations • 14 Jun 2021 • Alberto Bietti, Luca Venturi, Joan Bruna
Many supervised learning problems involve high-dimensional data such as images, text, or graphs.
1 code implementation • NeurIPS 2021 • Nicolas Keriven, Alberto Bietti, Samuel Vaiter
In the large graph limit, GNNs are known to converge to certain "continuous" models known as c-GNNs, which directly enables a study of their approximation power on random graph models.
1 code implementation • 15 Apr 2021 • Carles Domingo-Enrich, Alberto Bietti, Eric Vanden-Eijnden, Joan Bruna
Energy-based models (EBMs) are a simple yet powerful framework for generative modeling.
1 code implementation • ICLR 2022 • Alberto Bietti
The empirical success of deep convolutional networks on tasks involving high-dimensional data such as images or audio suggests that they can efficiently approximate certain functions that are well-suited for such tasks.
1 code implementation • ICLR 2021 • Alberto Bietti, Francis Bach
Deep networks are often considered to be more expressive than shallow ones in terms of approximation.
1 code implementation • NeurIPS 2020 • Nicolas Keriven, Alberto Bietti, Samuel Vaiter
We study properties of Graph Convolutional Networks (GCNs) by analyzing their behavior on standard models of random graphs, where nodes are represented by random latent variables and edges are drawn according to a similarity kernel.
1 code implementation • 22 Apr 2020 • Houssam Zenati, Alberto Bietti, Matthieu Martin, Eustache Diemert, Pierre Gaillard, Julien Mairal
Counterfactual reasoning from logged data has become increasingly important for many applications such as web advertising or healthcare.
1 code implementation • NeurIPS 2019 • Alberto Bietti, Julien Mairal
State-of-the-art neural networks are heavily over-parameterized, making the optimization algorithm a crucial ingredient for learning predictive models with good generalization properties.
1 code implementation • 30 Sep 2018 • Alberto Bietti, Grégoire Mialon, Dexiong Chen, Julien Mairal
We propose a new point of view for regularizing deep neural networks by using the norm of a reproducing kernel Hilbert space (RKHS).
1 code implementation • 12 Feb 2018 • Alberto Bietti, Alekh Agarwal, John Langford
Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems.
no code implementations • NeurIPS 2017 • Alberto Bietti, Julien Mairal
In this paper, we study deep signal representations that are near-invariant to groups of transformations and stable to the action of diffeomorphisms without losing signal information.
no code implementations • NeurIPS 2017 • Alberto Bietti, Julien Mairal
Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions.
1 code implementation • 9 Jun 2017 • Alberto Bietti, Julien Mairal
The success of deep convolutional architectures is often attributed in part to their ability to learn multiscale and invariant representations of natural signals.
1 code implementation • NeurIPS 2017 • Alberto Bietti, Julien Mairal
Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions.