2 code implementations • ICLR 2020 • Judy Hoffman, Daniel A. Roberts, Sho Yaida
Design of reliable systems must guarantee stability against input perturbations.
no code implementations • ICLR 2019 • Guy Gur-Ari, Daniel A. Roberts, Ethan Dyer
We show that in a variety of large-scale deep learning scenarios the gradient dynamically converges to a very small subspace after a short period of training.
no code implementations • 16 Feb 2021 • Joshua Batson, C. Grace Haaf, Yonatan Kahn, Daniel A. Roberts
Using a series of illustrative low-dimensional examples, we show explicitly how the intrinsic and extrinsic topology of the dataset affects the behavior of an autoencoder and how this topology is manifested in the latent space representation during training.
no code implementations • 31 Mar 2021 • Daniel A. Roberts
We discuss why AI is hard and why physics is simple.
no code implementations • 10 Apr 2021 • Daniel A. Roberts
We derive a simple and model-independent formula for the change in the generalization gap due to a gradient descent update.
no code implementations • 18 Jun 2021 • Daniel A. Roberts, Sho Yaida, Boris Hanin
This book develops an effective theory approach to understanding deep neural networks of practical relevance.
no code implementations • 30 Oct 2022 • Alexander Maloney, Daniel A. Roberts, James Sully
Large language models with a huge number of parameters, when trained on near internet-sized number of tokens, have been empirically shown to obey neural scaling laws: specifically, their performance behaves predictably as a power law in either parameters or dataset size until bottlenecked by the other resource.
no code implementations • 11 Oct 2023 • Hannah Day, Yonatan Kahn, Daniel A. Roberts
Fully-connected deep neural networks with weights initialized from independent Gaussian distributions can be tuned to criticality, which prevents the exponential growth or decay of signals propagating through the network.
no code implementations • 26 Mar 2024 • Andrey Gromov, Kushal Tirumala, Hassan Shapourian, Paolo Glorioso, Daniel A. Roberts
We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed.