no code implementations • 28 Feb 2024 • Jin Hwa Lee, Stefano Sarao Mannelli, Andrew Saxe
Diverse studies in systems neuroscience begin with extended periods of training known as 'shaping' procedures.
no code implementations • 17 Jun 2023 • Nishil Patel, Sebastian Lee, Stefano Sarao Mannelli, Sebastian Goldt, Andrew Saxe
Reinforcement learning (RL) algorithms have proven transformative in a range of domains.
no code implementations • 2 Mar 2023 • Federica Gerace, Diego Doimo, Stefano Sarao Mannelli, Luca Saglietti, Alessandro Laio
The simplest transfer learning protocol is based on ``freezing" the feature-extractor layers of a network pre-trained on a data-rich source task, and then adapting only the last layers to a data-poor target task.
no code implementations • 31 May 2022 • Stefano Sarao Mannelli, Federica Gerace, Negar Rostamzadeh, Luca Saglietti
Then, we consider a novel mitigation strategy based on a matched inference approach, consisting in the introduction of coupled learning models.
1 code implementation • 18 May 2022 • Sebastian Lee, Stefano Sarao Mannelli, Claudia Clopath, Sebastian Goldt, Andrew Saxe
Continual learning - learning new tasks in sequence while maintaining performance on old tasks - remains particularly challenging for artificial neural networks.
no code implementations • 15 Jun 2021 • Luca Saglietti, Stefano Sarao Mannelli, Andrew Saxe
To study the former, we provide an exact description of the online learning setting, confirming the long-standing experimental observation that curricula can modestly speed up learning.
no code implementations • 9 Jun 2021 • Federica Gerace, Luca Saglietti, Stefano Sarao Mannelli, Andrew Saxe, Lenka Zdeborová
Transfer learning can significantly improve the sample efficiency of neural networks, by exploiting the relatedness between a data-scarce target task and a data-abundant source task.
no code implementations • NeurIPS 2021 • Stefano Sarao Mannelli, Pierfrancesco Urbani
The optimization step in many machine learning problems rarely relies on vanilla gradient descent but it is common practice to use momentum-based accelerated methods.
no code implementations • 20 Sep 2020 • Antoine Baker, Indaco Biazzo, Alfredo Braunstein, Giovanni Catania, Luca Dall'Asta, Alessandro Ingrosso, Florent Krzakala, Fabio Mazza, Marc Mézard, Anna Paola Muntoni, Maria Refinetti, Stefano Sarao Mannelli, Lenka Zdeborová
We conclude that probabilistic risk estimation is capable to enhance performance of digital contact tracing and should be considered in the currently developed mobile applications.
no code implementations • NeurIPS 2020 • Stefano Sarao Mannelli, Eric Vanden-Eijnden, Lenka Zdeborová
We consider a teacher-student scenario where the teacher has the same structure as the student with a hidden layer of smaller width $m^*\le m$.
no code implementations • 25 Jun 2020 • Levent Sagun, Caglar Gulcehre, Adriana Romero, Negar Rostamzadeh, Stefano Sarao Mannelli
Science meets Engineering in Deep Learning took place in Vancouver as part of the Workshop section of NeurIPS 2019.
no code implementations • 24 Jun 2020 • Argyris Kalogeratos, Stefano Sarao Mannelli
In this paper we consider the epidemic competition between two generic diffusion processes, where each competing side is represented by a different state of a stochastic process.
no code implementations • NeurIPS 2020 • Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová
Despite the widespread use of gradient-based algorithms for optimizing high-dimensional non-convex functions, understanding their ability of finding good minima instead of being trapped in spurious ones remains to a large extent an open problem.
no code implementations • 2 Jan 2020 • Stefano Sarao Mannelli, Lenka Zdeborova
We review recent works on analyzing the dynamics of gradient-based algorithms in a prototypical statistical inference problem.
1 code implementation • NeurIPS 2019 • Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Lenka Zdeborová
Gradient-based algorithms are effective for many machine learning tasks, but despite ample recent effort and some progress, it often remains unclear why they work in practice in optimising high-dimensional non-convex functions and why they find good minima instead of being trapped in spurious ones. Here we present a quantitative theory explaining this behaviour in a spiked matrix-tensor model. Our framework is based on the Kac-Rice analysis of stationary points and a closed-form analysis of gradient-flow originating from statistical physics.
no code implementations • 18 Jul 2019 • Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Lenka Zdeborová
Gradient-based algorithms are effective for many machine learning tasks, but despite ample recent effort and some progress, it often remains unclear why they work in practice in optimising high-dimensional non-convex functions and why they find good minima instead of being trapped in spurious ones.
no code implementations • 1 Feb 2019 • Stefano Sarao Mannelli, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová
In this work we analyse quantitatively the interplay between the loss landscape and performance of descent algorithms in a prototypical inference problem, the spiked matrix-tensor model.
no code implementations • 21 Dec 2018 • Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová
Gradient-descent-based algorithms and their stochastic versions have widespread applications in machine learning and statistical inference.