1 code implementation • 19 Oct 2023 • Darshil Doshi, Aritra Das, Tianyu He, Andrey Gromov
Robust generalization is a major challenge in deep learning, particularly when the number of trainable parameters is very large.
no code implementations • 27 Jun 2022 • Tianyu He, Darshil Doshi, Andrey Gromov
Good initialization is essential for training Deep Neural Networks (DNNs).
no code implementations • 23 Nov 2021 • Darshil Doshi, Tianyu He, Andrey Gromov
We derive recurrence relations for the norms of partial Jacobians and utilize these relations to analyze criticality of deep fully connected neural networks with LayerNorm and/or residual connections.