no code implementations • 8 Mar 2024 • Alex Damian, Loucas Pillaud-Vivien, Jason D. Lee, Joan Bruna
Single-Index Models are high-dimensional regression problems with planted structure, whereby labels depend on an unknown one-dimensional projection of the input via a generic, non-linear, and potentially non-deterministic transformation.
no code implementations • 22 Feb 2024 • Eshaan Nichani, Alex Damian, Jason D. Lee
The key insight of our proof is that the gradient of the attention matrix encodes the mutual information between tokens.
2 code implementations • NeurIPS 2023 • Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, Jason D. Lee, Danqi Chen, Sanjeev Arora
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory.
1 code implementation • 30 Sep 2022 • Alex Damian, Eshaan Nichani, Jason D. Lee
Our analysis provides precise predictions for the loss, sharpness, and deviation from the PGD trajectory throughout training, which we verify both empirically in a number of standard settings and theoretically under mild conditions.
no code implementations • 30 Jun 2022 • Alex Damian, Jason D. Lee, Mahdi Soltanolkotabi
Furthermore, in a transfer learning setup where the data distributions in the source and target domain share the same representation $U$ but have different polynomial heads we show that a popular heuristic for transfer learning has a target sample complexity independent of $d$.
no code implementations • NeurIPS 2021 • Alex Damian, Tengyu Ma, Jason D. Lee
In overparametrized models, the noise in stochastic gradient descent (SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to.
1 code implementation • 9 May 2018 • Yijie Bei, Alex Damian, Shijia Hu, Sachit Menon, Nikhil Ravi, Cynthia Rudin
This work identifies and addresses two important technical challenges in single-image super-resolution: (1) how to upsample an image without magnifying noise and (2) how to preserve large scale structure when upsampling.