no code implementations • 2 Feb 2025 • Stephen Zhang, Mustafa Khan, Vardan Papyan
Two prominent features of large language models (LLMs) is the presence of large-norm (outlier) features and the tendency for tokens to attend very strongly to a select few tokens.
no code implementations • 20 Sep 2024 • Stephen Zhang, Vardan Papyan
The recent paradigm shift to large-scale foundation models has brought about a new era for deep learning that, while has found great success in practice, has also been plagued by prohibitively expensive costs in terms of high memory consumption and compute.
no code implementations • 10 Jul 2024 • Murdock Aubry, Haoming Meng, Anton Sugolov, Vardan Papyan
Large Language Models (LLMs) have made significant strides in natural language processing, and a precise understanding of the internal mechanisms driving their success is essential.
1 code implementation • 4 Jul 2024 • Stephen Zhang, Vardan Papyan
Pruning has emerged as a promising approach for compressing large-scale models, yet its effectiveness in recovering the sparsest of models has not yet been explored.
no code implementations • 2 Jul 2024 • David Glukhov, Ziwen Han, Ilia Shumailov, Vardan Papyan, Nicolas Papernot
To quantify these risks, we introduce a new safety evaluation framework based on impermissible information leakage of model outputs and demonstrate how our proposed question-decomposition attack can extract dangerous knowledge from a censored LLM more effectively than traditional jailbreaking.
2 code implementations • 28 May 2024 • Robert Wu, Vardan Papyan
Neural collapse ($\mathcal{NC}$) is a phenomenon observed in classification tasks where top-layer representations collapse into their class means, which become equinorm, equiangular and aligned with the classifiers.
no code implementations • 9 Feb 2024 • Quinn Fisher, Haoming Meng, Vardan Papyan
These findings are unexpected, as mixed-up features are not simple convex combinations of feature class means (as one might get, for example, by training mixup with the mean squared error loss).
no code implementations • NeurIPS 2023 • Jianing Li, Vardan Papyan
Our measurements reveal a process called Residual Alignment (RA) characterized by four properties: (RA1) intermediate representations of a given input are equispaced on a line, embedded in high dimensional space, as observed by Gai and Zhang [2021]; (RA2) top left and right singular vectors of Residual Jacobians align with each other and across different depths; (RA3) Residual Jacobians are at most rank C for fully-connected ResNets, where C is the number of classes; and (RA4) top singular values of Residual Jacobians scale inversely with depth.
no code implementations • 29 Dec 2023 • Benjamin Eyre, Elliot Creager, David Madras, Vardan Papyan, Richard Zemel
Designing deep neural network classifiers that perform robustly on distributions differing from the available training data is an active area of machine learning research.
no code implementations • 20 Jul 2023 • David Glukhov, Ilia Shumailov, Yarin Gal, Nicolas Papernot, Vardan Papyan
Specifically, we demonstrate that semantic censorship can be perceived as an undecidable problem, highlighting the inherent challenges in censorship that arise due to LLMs' programmatic and instruction-following capabilities.
1 code implementation • ICLR 2022 • X. Y. Han, Vardan Papyan, David L. Donoho
The analytically-tractable MSE loss offers more mathematical opportunities than the hard-to-analyze CE loss, inspiring us to leverage MSE loss towards the theoretical investigation of NC.
1 code implementation • 27 Aug 2020 • Vardan Papyan
Numerous researchers recently applied empirical spectral analysis to the study of modern deep learning classifiers.
1 code implementation • 18 Aug 2020 • Vardan Papyan, X. Y. Han, David L. Donoho
Modern practice for training classification deepnets involves a Terminal Phase of Training (TPT), which begins at the epoch where training error first vanishes; During TPT, the training error stays effectively zero while training loss is pushed towards zero.
no code implementations • 10 Jun 2019 • Morteza Mardani, Qingyun Sun, Vardan Papyan, Shreyas Vasanawala, John Pauly, David Donoho
Leveraging the Stein's Unbiased Risk Estimator (SURE), this paper analyzes the generalization risk with its bias and variance components for recurrent unrolled networks.
no code implementations • 17 May 2019 • Vardan Papyan
We apply state-of-the-art tools in modern high-dimensional numerical linear algebra to approximate efficiently the spectrum of the Hessian of modern deepnets, with tens of millions of parameters, trained on real data.
1 code implementation • 24 Jan 2019 • Vardan Papyan
We expose a structure in the derivative of the logits with respect to the parameters of the model, which is used to explain the existence of outliers in the spectrum of the Hessian.
2 code implementations • 16 Nov 2018 • Vardan Papyan
We decompose the Hessian into different components and study the dynamics with training and sample size of each term individually.
1 code implementation • NeurIPS 2018 • Morteza Mardani, Qingyun Sun, Shreyas Vasawanala, Vardan Papyan, Hatef Monajemi, John Pauly, David Donoho
Recovering high-resolution images from limited sensory data typically leads to a serious ill-posed inverse problem, demanding inversion algorithms that effectively capture the prior information.
no code implementations • 27 Nov 2017 • Morteza Mardani, Hatef Monajemi, Vardan Papyan, Shreyas Vasanawala, David Donoho, John Pauly
Building effective priors is however challenged by the low train and test overhead dictated by real-time tasks; and the need for retrieving visually "plausible" and physically "feasible" images with minimal hallucination.
no code implementations • 29 Aug 2017 • Jeremias Sulam, Vardan Papyan, Yaniv Romano, Michael Elad
We show that the training of the filters is essential to allow for non-trivial signals in the model, and we derive an online algorithm to learn the dictionaries from real data, effectively resulting in cascaded sparse convolutional layers.
1 code implementation • ICCV 2017 • Vardan Papyan, Yaniv Romano, Jeremias Sulam, Michael Elad
Convolutional Sparse Coding (CSC) is an increasingly popular model in the signal and image processing communities, tackling some of the limitations of traditional patch-based sparse representations.
no code implementations • 25 Nov 2016 • Vardan Papyan, Ronen Talmon
The first step in our analysis is to find the common source of variability present in all sensor measurements.
no code implementations • 27 Jul 2016 • Vardan Papyan, Yaniv Romano, Michael Elad
This is shown to be tightly connected to CNN, so much so that the forward pass of the CNN is in fact the thresholding pursuit serving the ML-CSC model.