Search Results for author: Darshil Doshi

Found 6 papers, 3 papers with code

(How) Can Transformers Predict Pseudo-Random Numbers?

no code implementations14 Feb 2025 Tao Tao, Darshil Doshi, Dayal Singh Kalra, Tianyu He, Maissam Barkeshli

Our analysis reveals that with sufficient architectural capacity and training data variety, Transformers can perform in-context prediction of LCG sequences with unseen moduli ($m$) and parameters ($a, c$).

Grokking Modular Polynomials

no code implementations5 Jun 2024 Darshil Doshi, Tianyu He, Aritra Das, Andrey Gromov

Neural networks readily learn a subset of the modular arithmetic tasks, while failing to generalize on the rest.

To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets

1 code implementation19 Oct 2023 Darshil Doshi, Aritra Das, Tianyu He, Andrey Gromov

Robust generalization is a major challenge in deep learning, particularly when the number of trainable parameters is very large.

Memorization

AutoInit: Automatic Initialization via Jacobian Tuning

no code implementations27 Jun 2022 Tianyu He, Darshil Doshi, Andrey Gromov

Good initialization is essential for training Deep Neural Networks (DNNs).

Critical Initialization of Wide and Deep Neural Networks through Partial Jacobians: General Theory and Applications

1 code implementation23 Nov 2021 Darshil Doshi, Tianyu He, Andrey Gromov

We derive recurrence relations for the norms of partial Jacobians and utilize these relations to analyze criticality of deep fully connected neural networks with LayerNorm and/or residual connections.

Cannot find the paper you are looking for? You can Submit a new open access paper.