no code implementations • 12 Feb 2024 • Yuxiao Wen, Arthur Jacot
We describe the emergence of a Convolution Bottleneck (CBN) structure in CNNs, where the network uses its first few layers to transform the input representation into a representation that is supported only along a few frequencies and channels, before using the last few layers to map back to the outputs.
no code implementations • 25 May 2023 • Zihan Wang, Arthur Jacot
The $L_{2}$-regularized loss of Deep Linear Networks (DLNs) with more than one hidden layers has multiple local minima, corresponding to matrices with different ranks.
no code implementations • 29 Sep 2022 • Arthur Jacot
We show that the representation cost of fully connected neural networks with homogeneous nonlinearities - which describes the implicit bias in function space of networks with $L_2$-regularization or with losses such as the cross-entropy - converges as the depth of the network goes to infinity to a notion of rank over nonlinear functions.
no code implementations • 31 May 2022 • Arthur Jacot, Eugene Golikov, Clément Hongler, Franck Gabriel
This second reformulation allows us to prove a sparsity result for homogeneous DNNs: any local minimum of the $L_{2}$-regularized loss can be achieved with at most $N(N+1)$ neurons in each hidden layer (where $N$ is the size of the training set).
no code implementations • 6 Nov 2021 • Yatin Dandi, Arthur Jacot
Spectral analysis is a powerful tool, decomposing any function into simpler parts.
no code implementations • 30 Jun 2021 • Arthur Jacot, François Ged, Berfin Şimşek, Clément Hongler, Franck Gabriel
The dynamics of Deep Linear Networks (DLNs) is dramatically affected by the variance $\sigma^2$ of the parameters at initialization $\theta_0$.
1 code implementation • NeurIPS 2021 • Benjamin Dupuis, Arthur Jacot
We study the Solid Isotropic Material Penalisation (SIMP) method with a density field generated by a fully-connected neural network, taking the coordinates as inputs.
1 code implementation • 25 May 2021 • Berfin Şimşek, François Ged, Arthur Jacot, Francesco Spadaro, Clément Hongler, Wulfram Gerstner, Johanni Brea
For a two-layer overparameterized network of width $ r^*+ h =: m $ we explicitly describe the manifold of global minima: it consists of $ T(r^*, m) $ affine subspaces of dimension at least $ h $ that are connected to one another.
no code implementations • NeurIPS 2020 • Arthur Jacot, Berfin Şimşek, Francesco Spadaro, Clément Hongler, Franck Gabriel
Under a natural universality assumption (that the KRR moments depend asymptotically on the first two moments of the observations) we capture the mean and variance of the KRR predictor.
no code implementations • ICML 2020 • Arthur Jacot, Berfin Şimşek, Francesco Spadaro, Clément Hongler, Franck Gabriel
We investigate, by means of random matrix theory, the connection between Gaussian RF models and Kernel Ridge Regression (KRR).
no code implementations • ICLR 2020 • Arthur Jacot, Franck Gabriel, Clément Hongler
The dynamics of DNNs during gradient descent is described by the so-called Neural Tangent Kernel (NTK).
no code implementations • 11 Jul 2019 • Arthur Jacot, Franck Gabriel, François Ged, Clément Hongler
Moving the network into the chaotic regime prevents checkerboard patterns; we propose a graph-based parametrization which eliminates border artifacts; finally, we introduce a new layer-dependent learning rate to improve the convergence of DC-NNs.
no code implementations • 19 Jun 2019 • Mario Geiger, Stefano Spigler, Arthur Jacot, Matthieu Wyart
Two distinct limits for deep learning have been derived as the network width $h\rightarrow \infty$, depending on how the weights of the last layer scale with $h$.
1 code implementation • 6 Jan 2019 • Mario Geiger, Arthur Jacot, Stefano Spigler, Franck Gabriel, Levent Sagun, Stéphane d'Ascoli, Giulio Biroli, Clément Hongler, Matthieu Wyart
At this threshold, we argue that $\|f_{N}\|$ diverges.
6 code implementations • NeurIPS 2018 • Arthur Jacot, Franck Gabriel, Clément Hongler
While the NTK is random at initialization and varies during training, in the infinite-width limit it converges to an explicit limiting kernel and it stays constant during training.