Search Results for author: Alex Damian

Found 7 papers, 3 papers with code

Computational-Statistical Gaps in Gaussian Single-Index Models

no code implementations8 Mar 2024 Alex Damian, Loucas Pillaud-Vivien, Jason D. Lee, Joan Bruna

Single-Index Models are high-dimensional regression problems with planted structure, whereby labels depend on an unknown one-dimensional projection of the input via a generic, non-linear, and potentially non-deterministic transformation.

How Transformers Learn Causal Structure with Gradient Descent

no code implementations22 Feb 2024 Eshaan Nichani, Alex Damian, Jason D. Lee

The key insight of our proof is that the gradient of the attention matrix encodes the mutual information between tokens.

In-Context Learning

Fine-Tuning Language Models with Just Forward Passes

2 code implementations NeurIPS 2023 Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, Jason D. Lee, Danqi Chen, Sanjeev Arora

Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory.

In-Context Learning Multiple-choice

Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability

1 code implementation30 Sep 2022 Alex Damian, Eshaan Nichani, Jason D. Lee

Our analysis provides precise predictions for the loss, sharpness, and deviation from the PGD trajectory throughout training, which we verify both empirically in a number of standard settings and theoretically under mild conditions.

Neural Networks can Learn Representations with Gradient Descent

no code implementations30 Jun 2022 Alex Damian, Jason D. Lee, Mahdi Soltanolkotabi

Furthermore, in a transfer learning setup where the data distributions in the source and target domain share the same representation $U$ but have different polynomial heads we show that a popular heuristic for transfer learning has a target sample complexity independent of $d$.

Transfer Learning

Label Noise SGD Provably Prefers Flat Global Minimizers

no code implementations NeurIPS 2021 Alex Damian, Tengyu Ma, Jason D. Lee

In overparametrized models, the noise in stochastic gradient descent (SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to.

New Techniques for Preserving Global Structure and Denoising with Low Information Loss in Single-Image Super-Resolution

1 code implementation9 May 2018 Yijie Bei, Alex Damian, Shijia Hu, Sachit Menon, Nikhil Ravi, Cynthia Rudin

This work identifies and addresses two important technical challenges in single-image super-resolution: (1) how to upsample an image without magnifying noise and (2) how to preserve large scale structure when upsampling.

Denoising Image Super-Resolution

Cannot find the paper you are looking for? You can Submit a new open access paper.