no code implementations • ICML 2020 • Stéphane d'Ascoli, Maria Refinetti, Giulio Biroli, Florent Krzakala
We demonstrate that the latter two contributions are the crux of the double descent: they lead to the overfitting peak at the interpolation threshold and to the decay of the test error upon overparametrization.
1 code implementation • 9 Oct 2023 • Stéphane d'Ascoli, Sören Becker, Alexander Mathis, Philippe Schwaller, Niki Kilbertus
We introduce ODEFormer, the first transformer able to infer multidimensional ordinary differential equation (ODE) systems in symbolic form from the observation of a single solution trajectory.
1 code implementation • 21 Sep 2023 • Stéphane d'Ascoli, Samy Bengio, Josh Susskind, Emmanuel Abbé
In this work, we introduce Boolformer, the first Transformer architecture trained to perform end-to-end symbolic regression of Boolean functions.
no code implementations • 27 Jun 2023 • Samy Jelassi, Stéphane d'Ascoli, Carles Domingo-Enrich, Yuhuai Wu, Yuanzhi Li, François Charton
We find that relative position embeddings enable length generalization for simple tasks, such as addition: models trained on $5$-digit numbers can perform $15$-digit sums.
3 code implementations • 22 Apr 2022 • Pierre-Alexandre Kamienny, Stéphane d'Ascoli, Guillaume Lample, François Charton
Symbolic regression, the task of predicting the mathematical expression of a function from the observation of its values, is a difficult task which usually involves a two-step procedure: predicting the "skeleton" of the expression up to the choice of numerical constants, then fitting the constants by optimizing a non-convex loss function.
no code implementations • 9 Feb 2022 • Stéphane d'Ascoli, Maria Refinetti, Giulio Biroli
In this case, it is optimal to keep a large learning rate during the exploration phase to escape the non-convex region as quickly as possible, then use the convex criterion $\beta=1$ to converge rapidly to the solution.
no code implementations • 12 Jan 2022 • Stéphane d'Ascoli, Pierre-Alexandre Kamienny, Guillaume Lample, François Charton
Symbolic regression, i. e. predicting a function from the observation of its values, is well-known to be a challenging task.
no code implementations • 10 Jun 2021 • Stéphane d'Ascoli, Levent Sagun, Giulio Biroli, Ari Morcos
Finally, we experiment initializing the T-CNN from a partially trained CNN, and find that it reaches better performance than the corresponding hybrid model trained from scratch, while reducing training time.
9 code implementations • 19 Mar 2021 • Stéphane d'Ascoli, Hugo Touvron, Matthew Leavitt, Ari Morcos, Giulio Biroli, Levent Sagun
We initialise the GPSA layers to mimic the locality of convolutional layers, then give each attention head the freedom to escape locality by adjusting a gating parameter regulating the attention paid to position versus content information.
Ranked #523 on Image Classification on ImageNet
1 code implementation • NeurIPS 2021 • Stéphane d'Ascoli, Marylou Gabrié, Levent Sagun, Giulio Biroli
One of the central puzzles in modern machine learning is the ability of heavily overparametrized models to generalize well.
1 code implementation • 24 Nov 2020 • Maria Refinetti, Stéphane d'Ascoli, Ruben Ohana, Sebastian Goldt
Direct Feedback Alignment (DFA) is emerging as an efficient and biologically plausible alternative to the ubiquitous backpropagation algorithm for training deep neural networks.
1 code implementation • 3 Nov 2020 • Stéphane d'Ascoli, Alice Coucke, Francesco Caltagirone, Alexandre Caulier, Marc Lelarge
Scarcity of training data for task-oriented dialogue systems is a well known problem that is usually tackled with costly and time-consuming manual data annotation.
1 code implementation • NeurIPS 2020 • Stéphane d'Ascoli, Levent Sagun, Giulio Biroli
We show that this peak is implicitly regularized by the nonlinearity, which is why it only becomes salient at high noise and is weakly affected by explicit regularization.
2 code implementations • 2 Mar 2020 • Stéphane d'Ascoli, Maria Refinetti, Giulio Biroli, Florent Krzakala
We obtain a precise asymptotic expression for the bias-variance decomposition of the test error, and show that the bias displays a phase transition at the interpolation threshold, beyond which it remains constant.
1 code implementation • 9 Nov 2019 • Stéphane d'Ascoli, Alice Coucke, Francesco Caltagirone, Alexandre Caulier, Marc Lelarge
Scarcity of training data for task-oriented dialogue systems is a well known problem that is usually tackled with costly and time-consuming manual data annotation.
1 code implementation • NeurIPS 2019 • Stéphane d'Ascoli, Levent Sagun, Joan Bruna, Giulio Biroli
The aim of this work is to understand this fact through the lens of dynamics in the loss landscape.
1 code implementation • 6 Jan 2019 • Mario Geiger, Arthur Jacot, Stefano Spigler, Franck Gabriel, Levent Sagun, Stéphane d'Ascoli, Giulio Biroli, Clément Hongler, Matthieu Wyart
At this threshold, we argue that $\|f_{N}\|$ diverges.
no code implementations • 22 Oct 2018 • Stefano Spigler, Mario Geiger, Stéphane d'Ascoli, Levent Sagun, Giulio Biroli, Matthieu Wyart
We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved.
2 code implementations • 25 Sep 2018 • Mario Geiger, Stefano Spigler, Stéphane d'Ascoli, Levent Sagun, Marco Baity-Jesi, Giulio Biroli, Matthieu Wyart
In the vicinity of this transition, properties of the curvature of the minima of the loss are critical.