Search Results for author: Daniel A. Roberts

Found 9 papers, 1 papers with code

Gradient Descent Happens in a Tiny Subspace

no code implementations ICLR 2019 Guy Gur-Ari, Daniel A. Roberts, Ethan Dyer

We show that in a variety of large-scale deep learning scenarios the gradient dynamically converges to a very small subspace after a short period of training.

General Classification

Topological Obstructions to Autoencoding

no code implementations16 Feb 2021 Joshua Batson, C. Grace Haaf, Yonatan Kahn, Daniel A. Roberts

Using a series of illustrative low-dimensional examples, we show explicitly how the intrinsic and extrinsic topology of the dataset affects the behavior of an autoencoder and how this topology is manifested in the latent space representation during training.

Anomaly Detection Inductive Bias

SGD Implicitly Regularizes Generalization Error

no code implementations10 Apr 2021 Daniel A. Roberts

We derive a simple and model-independent formula for the change in the generalization gap due to a gradient descent update.

Stochastic Optimization

The Principles of Deep Learning Theory

no code implementations18 Jun 2021 Daniel A. Roberts, Sho Yaida, Boris Hanin

This book develops an effective theory approach to understanding deep neural networks of practical relevance.

Inductive Bias Learning Theory +1

A Solvable Model of Neural Scaling Laws

no code implementations30 Oct 2022 Alexander Maloney, Daniel A. Roberts, James Sully

Large language models with a huge number of parameters, when trained on near internet-sized number of tokens, have been empirically shown to obey neural scaling laws: specifically, their performance behaves predictably as a power law in either parameters or dataset size until bottlenecked by the other resource.

Feature Learning and Generalization in Deep Networks with Orthogonal Weights

no code implementations11 Oct 2023 Hannah Day, Yonatan Kahn, Daniel A. Roberts

Fully-connected deep neural networks with weights initialized from independent Gaussian distributions can be tuned to criticality, which prevents the exponential growth or decay of signals propagating through the network.

The Unreasonable Ineffectiveness of the Deeper Layers

no code implementations26 Mar 2024 Andrey Gromov, Kushal Tirumala, Hassan Shapourian, Paolo Glorioso, Daniel A. Roberts

We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed.

Quantization Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.