no code implementations • 11 Aug 2024 • Brett W. Larsen, Tamara G. Kolda, Anru R. Zhang, Alex H. Williams
We refer to tensors with some infinite-dimensional modes as quasitensors, and the approach of decomposing a tensor with some continuous RKHS modes is referred to as CP-HiFi (hybrid infinite and finite dimensional) tensor decomposition.
no code implementations • 5 Jun 2024 • Cody Blakeney, Mansheej Paul, Brett W. Larsen, Sean Owen, Jonathan Frankle
In this work, we show how to leverage the smaller domain specific datasets by upsampling them relative to CC at the end of training to drive performance improvements on difficult benchmarks.
no code implementations • 19 Nov 2023 • Sarah E. Harvey, Brett W. Larsen, Alex H. Williams
A multitude of (dis)similarity measures between neural network representations have been proposed, resulting in a fragmented research landscape.
1 code implementation • 9 Oct 2023 • Dean A. Pospisil, Brett W. Larsen, Sarah E. Harvey, Alex H. Williams
Measuring geometric similarity between high-dimensional network representations is a topic of longstanding interest to neuroscience and deep learning.
no code implementations • 6 Oct 2022 • Mansheej Paul, Feng Chen, Brett W. Larsen, Jonathan Frankle, Surya Ganguli, Gintare Karolina Dziugaite
Third, we show how the flatness of the error landscape at the end of training determines a limit on the fraction of weights that can be pruned at each iteration of IMP.
1 code implementation • 2 Jun 2022 • Mansheej Paul, Brett W. Larsen, Surya Ganguli, Jonathan Frankle, Gintare Karolina Dziugaite
A striking observation about iterative magnitude pruning (IMP; Frankle et al. 2020) is that $\unicode{x2014}$ after just a few hundred steps of dense training $\unicode{x2014}$ the method can find a sparse sub-network that can be trained to the same accuracy as the dense network.
1 code implementation • ICLR 2022 • Brett W. Larsen, Stanislav Fort, Nic Becker, Surya Ganguli
In particular, we show via Gordon's escape theorem, that the training dimension plus the Gaussian width of the desired loss sub-level set, projected onto a unit sphere surrounding the initialization, must exceed the total number of parameters for the success probability to be large.
1 code implementation • 31 Dec 2019 • Abbas Kazemipour, Brett W. Larsen, Shaul Druckmann
Despite their practical success, a theoretical understanding of the loss landscape of neural networks has proven challenging due to the high-dimensional, non-convex, and highly nonlinear structure of such models.