Search Results for author: Andrey Gromov

Found 7 papers, 3 papers with code

Critical Initialization of Wide and Deep Neural Networks through Partial Jacobians: General Theory and Applications

no code implementations • 23 Nov 2021 • Darshil Doshi, Tianyu He, Andrey Gromov

We derive recurrence relations for the norms of partial Jacobians and utilize these relations to analyze criticality of deep fully connected neural networks with LayerNorm and/or residual connections.

Paper
Add Code

AutoInit: Automatic Initialization via Jacobian Tuning

no code implementations • 27 Jun 2022 • Tianyu He, Darshil Doshi, Andrey Gromov

Good initialization is essential for training Deep Neural Networks (DNNs).

Paper
Add Code

Grokking modular arithmetic

1 code implementation • 6 Jan 2023 • Andrey Gromov

We present a simple neural network that can learn modular arithmetic tasks and exhibits a sudden jump in generalization known as ``grokking''.

Paper
Code

To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets

1 code implementation • 19 Oct 2023 • Darshil Doshi, Aritra Das, Tianyu He, Andrey Gromov

Robust generalization is a major challenge in deep learning, particularly when the number of trainable parameters is very large.

Memorization

Paper
Code

Bridging Associative Memory and Probabilistic Modeling

no code implementations • 15 Feb 2024 • Rylan Schaeffer, Nika Zahedi, Mikail Khona, Dhruv Pai, Sang Truong, Yilun Du, Mitchell Ostrow, Sarthak Chandra, Andres Carranza, Ila Rani Fiete, Andrey Gromov, Sanmi Koyejo

Based on the observation that associative memory's energy functions can be seen as probabilistic modeling's negative log likelihoods, we build a bridge between the two that enables useful flow of ideas in both directions.

In-Context Learning

Paper
Add Code

The Unreasonable Ineffectiveness of the Deeper Layers

1 code implementation • 26 Mar 2024 • Andrey Gromov, Kushal Tirumala, Hassan Shapourian, Paolo Glorioso, Daniel A. Roberts

We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed.

Quantization Question Answering

Paper
Code

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

no code implementations • 1 Apr 2024 • Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo

The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs?

Image Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.