Search Results for author: Arthur Jacot

Found 21 papers, 5 papers with code

Shallow diffusion networks provably learn hidden low-dimensional structure

no code implementations15 Oct 2024 Nicholas M. Boffi, Arthur Jacot, Stephen Tu, Ingvar Ziemann

Diffusion-based generative models provide a powerful framework for learning to sample from a complex target distribution.

Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse

no code implementations7 Oct 2024 Arthur Jacot, Peter Súkeník, Zihan Wang, Marco Mondelli

We first prove generic guarantees on neural collapse that assume (i) low training error and balancedness of the linear layers (for within-class variability collapse), and (ii) bounded conditioning of the features before the linear part (for orthogonality of class-means, as well as their alignment with weight matrices).

How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning

1 code implementation8 Jul 2024 Arthur Jacot, Seok Hoan Choi, Yuxiao Wen

We show that deep neural networks (DNNs) can efficiently learn any composition of functions with bounded $F_{1}$-norm, which allows DNNs to break the curse of dimensionality in ways that shallow networks cannot.

Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes

no code implementations27 May 2024 Zhenfeng Tu, Santiago Aranguri, Arthur Jacot

The training dynamics of linear networks are well studied in two distinct setups: the lazy regime and balanced/active regime, depending on the initialization and width of the network.

Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets

no code implementations27 May 2024 Arthur Jacot, Alexandre Kaiser

We leverage this intuition to explain the emergence of a bottleneck structure, as observed in previous work: for large $\tilde{L}$ the potential energy dominates and leads to a separation of timescales, where the representation jumps rapidly from the high dimensional inputs to a low-dimensional representation, move slowly inside the space of low-dimensional representations, before jumping back to the potentially high-dimensional outputs.

Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning

no code implementations12 Feb 2024 Yuxiao Wen, Arthur Jacot

We describe the emergence of a Convolution Bottleneck (CBN) structure in CNNs, where the network uses its first few layers to transform the input representation into a representation that is supported only along a few frequencies and channels, before using the last few layers to map back to the outputs.

Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff

no code implementations NeurIPS 2023 Arthur Jacot

Finally, we prove the conjectured bottleneck structure in the learned features as $L\to\infty$: for large depths, almost all hidden representations are approximately $R^{(0)}(f)$-dimensional, and almost all weight matrices $W_{\ell}$ have $R^{(0)}(f)$ singular values close to 1 while the others are $O(L^{-\frac{1}{2}})$.

Implicit bias of SGD in $L_{2}$-regularized linear DNNs: One-way jumps from high to low rank

no code implementations25 May 2023 Zihan Wang, Arthur Jacot

The $L_{2}$-regularized loss of Deep Linear Networks (DLNs) with more than one hidden layers has multiple local minima, corresponding to matrices with different ranks.

Matrix Completion

Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear Functions

no code implementations29 Sep 2022 Arthur Jacot

We show that the representation cost of fully connected neural networks with homogeneous nonlinearities - which describes the implicit bias in function space of networks with $L_2$-regularization or with losses such as the cross-entropy - converges as the depth of the network goes to infinity to a notion of rank over nonlinear functions.

Denoising

Feature Learning in $L_{2}$-regularized DNNs: Attraction/Repulsion and Sparsity

no code implementations31 May 2022 Arthur Jacot, Eugene Golikov, Clément Hongler, Franck Gabriel

This second reformulation allows us to prove a sparsity result for homogeneous DNNs: any local minimum of the $L_{2}$-regularized loss can be achieved with at most $N(N+1)$ neurons in each hidden layer (where $N$ is the size of the training set).

Understanding Layer-wise Contributions in Deep Neural Networks through Spectral Analysis

no code implementations6 Nov 2021 Yatin Dandi, Arthur Jacot

Spectral analysis is a powerful tool, decomposing any function into simpler parts.

Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity

no code implementations30 Jun 2021 Arthur Jacot, François Ged, Berfin Şimşek, Clément Hongler, Franck Gabriel

The dynamics of Deep Linear Networks (DLNs) is dramatically affected by the variance $\sigma^2$ of the parameters at initialization $\theta_0$.

L2 Regularization

DNN-Based Topology Optimisation: Spatial Invariance and Neural Tangent Kernel

1 code implementation NeurIPS 2021 Benjamin Dupuis, Arthur Jacot

We study the Solid Isotropic Material Penalisation (SIMP) method with a density field generated by a fully-connected neural network, taking the coordinates as inputs.

Translation

Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

1 code implementation25 May 2021 Berfin Şimşek, François Ged, Arthur Jacot, Francesco Spadaro, Clément Hongler, Wulfram Gerstner, Johanni Brea

For a two-layer overparameterized network of width $ r^*+ h =: m $ we explicitly describe the manifold of global minima: it consists of $ T(r^*, m) $ affine subspaces of dimension at least $ h $ that are connected to one another.

Kernel Alignment Risk Estimator: Risk Prediction from Training Data

no code implementations NeurIPS 2020 Arthur Jacot, Berfin Şimşek, Francesco Spadaro, Clément Hongler, Franck Gabriel

Under a natural universality assumption (that the KRR moments depend asymptotically on the first two moments of the observations) we capture the mean and variance of the KRR predictor.

regression

Implicit Regularization of Random Feature Models

no code implementations ICML 2020 Arthur Jacot, Berfin Şimşek, Francesco Spadaro, Clément Hongler, Franck Gabriel

We investigate, by means of random matrix theory, the connection between Gaussian RF models and Kernel Ridge Regression (KRR).

The asymptotic spectrum of the Hessian of DNN throughout training

no code implementations ICLR 2020 Arthur Jacot, Franck Gabriel, Clément Hongler

The dynamics of DNNs during gradient descent is described by the so-called Neural Tangent Kernel (NTK).

Order and Chaos: NTK views on DNN Normalization, Checkerboard and Boundary Artifacts

no code implementations11 Jul 2019 Arthur Jacot, Franck Gabriel, François Ged, Clément Hongler

Moving the network into the chaotic regime prevents checkerboard patterns; we propose a graph-based parametrization which eliminates border artifacts; finally, we introduce a new layer-dependent learning rate to improve the convergence of DC-NNs.

Disentangling feature and lazy training in deep neural networks

no code implementations19 Jun 2019 Mario Geiger, Stefano Spigler, Arthur Jacot, Matthieu Wyart

Two distinct limits for deep learning have been derived as the network width $h\rightarrow \infty$, depending on how the weights of the last layer scale with $h$.

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

6 code implementations NeurIPS 2018 Arthur Jacot, Franck Gabriel, Clément Hongler

While the NTK is random at initialization and varies during training, in the infinite-width limit it converges to an explicit limiting kernel and it stays constant during training.

Gaussian Processes

Cannot find the paper you are looking for? You can Submit a new open access paper.