Search Results for author: Clément Hongler

Found 12 papers, 4 papers with code

Looking for Complexity at Phase Boundaries in Continuous Cellular Automata

no code implementations27 Feb 2024 Vassilis Papadopoulos, Guilhem Doat, Arthur Renard, Clément Hongler

One key challenge in Artificial Life is designing systems that display an emergence of complex behaviors.

Artificial Life

Arrows of Time for Large Language Models

2 code implementations30 Jan 2024 Vassilis Papadopoulos, Jérémie Wenger, Clément Hongler

We study the probabilistic modeling performed by Autoregressive Large Language Models (LLMs) through the angle of time directionality, addressing a question first raised in (Shannon, 1951).

Language Modelling

Feature Learning in $L_{2}$-regularized DNNs: Attraction/Repulsion and Sparsity

no code implementations31 May 2022 Arthur Jacot, Eugene Golikov, Clément Hongler, Franck Gabriel

This second reformulation allows us to prove a sparsity result for homogeneous DNNs: any local minimum of the $L_{2}$-regularized loss can be achieved with at most $N(N+1)$ neurons in each hidden layer (where $N$ is the size of the training set).

Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity

no code implementations30 Jun 2021 Arthur Jacot, François Ged, Berfin Şimşek, Clément Hongler, Franck Gabriel

The dynamics of Deep Linear Networks (DLNs) is dramatically affected by the variance $\sigma^2$ of the parameters at initialization $\theta_0$.

L2 Regularization

Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

1 code implementation25 May 2021 Berfin Şimşek, François Ged, Arthur Jacot, Francesco Spadaro, Clément Hongler, Wulfram Gerstner, Johanni Brea

For a two-layer overparameterized network of width $ r^*+ h =: m $ we explicitly describe the manifold of global minima: it consists of $ T(r^*, m) $ affine subspaces of dimension at least $ h $ that are connected to one another.

Smart Proofs via Smart Contracts: Succinct and Informative Mathematical Derivations via Decentralized Markets

no code implementations5 Feb 2021 Sylvain Carré, Franck Gabriel, Clément Hongler, Gustavo Lacerda, Gloria Capano

We propose a game-theoretic discussion of SPRIG, showing how agents with various types of information interact, leading to a proof tree with an appropriate level of detail and to the invalidation of wrong proofs, and we discuss resilience against various attacks.

valid

Kernel Alignment Risk Estimator: Risk Prediction from Training Data

no code implementations NeurIPS 2020 Arthur Jacot, Berfin Şimşek, Francesco Spadaro, Clément Hongler, Franck Gabriel

Under a natural universality assumption (that the KRR moments depend asymptotically on the first two moments of the observations) we capture the mean and variance of the KRR predictor.

regression

Implicit Regularization of Random Feature Models

no code implementations ICML 2020 Arthur Jacot, Berfin Şimşek, Francesco Spadaro, Clément Hongler, Franck Gabriel

We investigate, by means of random matrix theory, the connection between Gaussian RF models and Kernel Ridge Regression (KRR).

The asymptotic spectrum of the Hessian of DNN throughout training

no code implementations ICLR 2020 Arthur Jacot, Franck Gabriel, Clément Hongler

The dynamics of DNNs during gradient descent is described by the so-called Neural Tangent Kernel (NTK).

Order and Chaos: NTK views on DNN Normalization, Checkerboard and Boundary Artifacts

no code implementations11 Jul 2019 Arthur Jacot, Franck Gabriel, François Ged, Clément Hongler

Moving the network into the chaotic regime prevents checkerboard patterns; we propose a graph-based parametrization which eliminates border artifacts; finally, we introduce a new layer-dependent learning rate to improve the convergence of DC-NNs.

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

6 code implementations NeurIPS 2018 Arthur Jacot, Franck Gabriel, Clément Hongler

While the NTK is random at initialization and varies during training, in the infinite-width limit it converges to an explicit limiting kernel and it stays constant during training.

Gaussian Processes

Cannot find the paper you are looking for? You can Submit a new open access paper.