Search Results for author: Andrey Gromov

Found 12 papers, 5 papers with code

PARQ: Piecewise-Affine Regularized Quantization

no code implementations19 Mar 2025 Lisa Jin, Jianhao Ma, Zechun Liu, Andrey Gromov, Aaron Defazio, Lin Xiao

We develop a principled method for quantization-aware training (QAT) of large-scale machine learning models.

Quantization

Spectral Journey: How Transformers Predict the Shortest Path

no code implementations12 Feb 2025 Andrew Cohen, Andrey Gromov, Kaiyu Yang, Yuandong Tian

In this setting, the representations and the dynamics learned by the model are interpretable.

Decoder Graph Embedding

Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

no code implementations13 Jun 2024 Rylan Schaeffer, Victor Lecomte, Dhruv Bhandarkar Pai, Andres Carranza, Berivan Isik, Alyssa Unell, Mikail Khona, Thomas Yerxa, Yann Lecun, SueYeon Chung, Andrey Gromov, Ravid Shwartz-Ziv, Sanmi Koyejo

We then leverage tools from information theory to show that such embeddings maximize a well-known lower bound on mutual information between views, thereby connecting the geometric perspective of MMCR to the information-theoretic perspective commonly discussed in MVSSL.

Self-Supervised Learning

Grokking Modular Polynomials

no code implementations5 Jun 2024 Darshil Doshi, Tianyu He, Aritra Das, Andrey Gromov

Neural networks readily learn a subset of the modular arithmetic tasks, while failing to generalize on the rest.

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

no code implementations1 Apr 2024 Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo

The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs?

Image Generation

The Unreasonable Ineffectiveness of the Deeper Layers

3 code implementations26 Mar 2024 Andrey Gromov, Kushal Tirumala, Hassan Shapourian, Paolo Glorioso, Daniel A. Roberts

We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed.

Quantization Question Answering

Bridging Associative Memory and Probabilistic Modeling

no code implementations15 Feb 2024 Rylan Schaeffer, Nika Zahedi, Mikail Khona, Dhruv Pai, Sang Truong, Yilun Du, Mitchell Ostrow, Sarthak Chandra, Andres Carranza, Ila Rani Fiete, Andrey Gromov, Sanmi Koyejo

Based on the observation that associative memory's energy functions can be seen as probabilistic modeling's negative log likelihoods, we build a bridge between the two that enables useful flow of ideas in both directions.

In-Context Learning

To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets

1 code implementation19 Oct 2023 Darshil Doshi, Aritra Das, Tianyu He, Andrey Gromov

Robust generalization is a major challenge in deep learning, particularly when the number of trainable parameters is very large.

Memorization

Grokking modular arithmetic

1 code implementation6 Jan 2023 Andrey Gromov

We present a simple neural network that can learn modular arithmetic tasks and exhibits a sudden jump in generalization known as ``grokking''.

AutoInit: Automatic Initialization via Jacobian Tuning

no code implementations27 Jun 2022 Tianyu He, Darshil Doshi, Andrey Gromov

Good initialization is essential for training Deep Neural Networks (DNNs).

Critical Initialization of Wide and Deep Neural Networks through Partial Jacobians: General Theory and Applications

1 code implementation23 Nov 2021 Darshil Doshi, Tianyu He, Andrey Gromov

We derive recurrence relations for the norms of partial Jacobians and utilize these relations to analyze criticality of deep fully connected neural networks with LayerNorm and/or residual connections.

Cannot find the paper you are looking for? You can Submit a new open access paper.