no code implementations • 14 Feb 2024 • Yashas Samaga B L, Varun Yerram, Chong You, Srinadh Bhojanapalli, Sanjiv Kumar, Prateek Jain, Praneeth Netrapalli
Autoregressive decoding with generative Large Language Models (LLMs) on accelerators (GPUs/TPUs) is often memory-bound where most of the time is spent on transferring model parameters from high bandwidth memory (HBM) to cache.
no code implementations • 13 Oct 2023 • Lin Chen, Michal Lukasik, Wittawat Jitkrittum, Chong You, Sanjiv Kumar
Classical wisdom in machine learning holds that the generalization error can be decomposed into bias and variance, and these two terms exhibit a \emph{trade-off}.
no code implementations • 9 Oct 2023 • Jiachen Jiang, Jinxin Zhou, Peng Wang, Qing Qu, Dustin Mixon, Chong You, Zhihui Zhu
However, most of the existing empirical and theoretical studies in neural collapse focus on the case that the number of classes is small relative to the dimension of the feature space.
no code implementations • 6 Oct 2023 • Shanda Li, Chong You, Guru Guruganesh, Joshua Ainslie, Santiago Ontanon, Manzil Zaheer, Sumit Sanghai, Yiming Yang, Sanjiv Kumar, Srinadh Bhojanapalli
Preventing the performance decay of Transformers on inputs longer than those used for training has been an important challenge in extending the context length of these models.
1 code implementation • 24 Oct 2022 • Xili Dai, Mingyang Li, Pengyuan Zhai, Shengbang Tong, Xingjian Gao, Shao-Lun Huang, Zhihui Zhu, Chong You, Yi Ma
We show that such models have equally strong empirical performance on CIFAR-10, CIFAR-100, and ImageNet datasets when compared to conventional neural networks.
no code implementations • 12 Oct 2022 • Zonglin Li, Chong You, Srinadh Bhojanapalli, Daliang Li, Ankit Singh Rawat, Sashank J. Reddi, Ke Ye, Felix Chern, Felix Yu, Ruiqi Guo, Sanjiv Kumar
This paper studies the curious phenomenon for machine learning models with Transformer architectures that their activation maps are sparse.
no code implementations • 4 Oct 2022 • Jinxin Zhou, Chong You, Xiao Li, Kangning Liu, Sheng Liu, Qing Qu, Zhihui Zhu
We extend such results and show through global solution and landscape analyses that a broad family of loss functions including commonly used label smoothing (LS) and focal loss (FL) exhibits Neural Collapse.
no code implementations • 14 Aug 2022 • Manzil Zaheer, Ankit Singh Rawat, Seungyeon Kim, Chong You, Himanshu Jain, Andreas Veit, Rob Fergus, Sanjiv Kumar
In this paper, we propose the teacher-guided training (TGT) framework for training a high-quality compact model that leverages the knowledge acquired by pretrained generative models, while obviating the need to go through a large volume of data.
no code implementations • 2 Mar 2022 • Jinxin Zhou, Xiao Li, Tianyu Ding, Chong You, Qing Qu, Zhihui Zhu
When training deep neural networks for classification tasks, an intriguing empirical phenomenon has been widely observed in the last-layer classifiers and features, where (i) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and (ii) cross-example within-class variability of last-layer activations collapses to zero.
1 code implementation • 28 Feb 2022 • Sheng Liu, Zhihui Zhu, Qing Qu, Chong You
In this work, we propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted.
Ranked #1 on Learning with noisy labels on CIFAR-10N-Random3
1 code implementation • CVPR 2021 • Shangzhi Zhang, Chong You, René Vidal, Chun-Guang Li
We show that our SENet can not only learn the self-expressive coefficients with desired properties on the training data, but also handle out-of-sample data.
2 code implementations • 21 May 2021 • Kwan Ho Ryan Chan, Yaodong Yu, Chong You, Haozhi Qi, John Wright, Yi Ma
This work attempts to provide a plausible theoretical framework that aims to interpret modern deep (convolutional) networks from the principles of data compression and discriminative representation.
1 code implementation • NeurIPS 2021 • Zhihui Zhu, Tianyu Ding, Jinxin Zhou, Xiao Li, Chong You, Jeremias Sulam, Qing Qu
In contrast to existing landscape analysis for deep neural networks which is often disconnected from practice, our analysis of the simplified model not only does it explain what kind of features are learned in the last layer, but it also shows why they can be efficiently optimized in the simplified settings, matching the empirical observations in practical deep network architectures.
1 code implementation • NeurIPS 2021 • Sheng Liu, Xiao Li, Yuexiang Zhai, Chong You, Zhihui Zhu, Carlos Fernandez-Granda, Qing Qu
Furthermore, we show that our ConvNorm can reduce the layerwise spectral norm of the weight matrices and hence improve the Lipschitzness of the network, leading to easier training and improved robustness for deep ConvNets.
no code implementations • CVPR 2021 • Ziyang Wu, Christina Baek, Chong You, Yi Ma
Current deep learning architectures suffer from catastrophic forgetting, a failure to retain knowledge of previously learned classes when incrementally trained on new classes.
3 code implementations • 27 Oct 2020 • Kwan Ho Ryan Chan, Yaodong Yu, Chong You, Haozhi Qi, John Wright, Yi Ma
The layered architectures, linear and nonlinear operators, and even parameters of the network are all explicitly constructed layer-by-layer in a forward propagation fashion by emulating the gradient scheme.
no code implementations • ICLR 2021 • Benjamin D. Haeffele, Chong You, René Vidal
To extend this approach to data supported on a union of non-linear manifolds, numerous studies have proposed learning an embedding of the original data using a neural network which is regularized by a self-expressive loss function on the data in the embedded space to encourage a union of linear subspaces prior on the data in the embedded space.
1 code implementation • ICML 2020 • Haozhi Qi, Chong You, Xiaolong Wang, Yi Ma, Jitendra Malik
Initialization, normalization, and skip connections are believed to be three indispensable techniques for training very deep convolutional neural networks and obtaining state-of-the-art performance.
1 code implementation • NeurIPS 2020 • Chong You, Zhihui Zhu, Qing Qu, Yi Ma
This paper shows that with a double over-parameterization for both the low-rank matrix and sparse corruption, gradient descent with discrepant learning rates provably recovers the underlying matrix even without prior knowledge on neither rank of the matrix nor sparsity of the corruption.
2 code implementations • NeurIPS 2020 • Yaodong Yu, Kwan Ho Ryan Chan, Chong You, Chaobing Song, Yi Ma
To learn intrinsic low-dimensional structures from high-dimensional data that most discriminate between classes, we propose the principle of Maximal Coding Rate Reduction ($\text{MCR}^2$), an information-theoretic measure that maximizes the coding rate difference between the whole dataset and the sum of each individual class.
Ranked #15 on Image Clustering on STL-10
no code implementations • 11 Jun 2020 • Jeremias Sulam, Chong You, Zhihui Zhu
We thoroughly demonstrate this observation in practice and provide an analysis of this phenomenon by tying recovery measures to generalization bounds.
no code implementations • 7 Jun 2020 • Chong You, Chi Li, Daniel P. Robinson, Rene Vidal
When the dataset is drawn from a union of independent subspaces, our method is able to select sufficiently many representatives from each subspace.
no code implementations • ICCV 2019 • Chong You, Chun-Guang Li, Daniel P. Robinson, Rene Vidal
Specifically, our analysis provides conditions that guarantee the correctness of affine subspace clustering methods both with and without the affine constraint, and shows that these conditions are satisfied for high-dimensional data.
no code implementations • CVPR 2020 • Ying Chen, Chun-Guang Li, Chong You
State-of-the-art subspace clustering methods are based on self-expressive model, which represents each data point as a linear combination of other data points.
1 code implementation • ICML 2020 • Zitong Yang, Yaodong Yu, Chong You, Jacob Steinhardt, Yi Ma
We provide a simple explanation for this by measuring the bias and variance of neural networks: while the bias is monotonically decreasing as in the classical theory, the variance is unimodal or bell-shaped: it increases then decreases with the width of the network.
no code implementations • 30 Dec 2019 • Daniel P. Robinson, Rene Vidal, Chong You
The goal is to have the representation $c$ correctly identify the subspace, i. e. the nonzero entries of $c$ should correspond to columns of $A$ that are in the subspace $\mathcal{S}_0$.
no code implementations • CVPR 2019 • Junjian Zhang, Chun-Guang Li, Chong You, Xianbiao Qi, Honggang Zhang, Jun Guo, Zhouchen Lin
However, the applicability of subspace clustering has been limited because practical visual data in raw form do not necessarily lie in such linear subspaces.
Ranked #2 on Image Clustering on Extended Yale-B
no code implementations • ECCV 2018 • Chong You, Chi Li, Daniel P. Robinson, Rene Vidal
Our experiments demonstrate that the proposed method outperforms state-of-the-art subspace clustering methods in two large-scale image datasets that are imbalanced.
no code implementations • 17 Aug 2018 • Chun-Guang Li, Chong You, René Vidal
In this paper, we develop a novel geometric analysis for a variant of SSC, named affine SSC (ASSC), for the problem of clustering data from a union of affine subspaces.
no code implementations • CVPR 2017 • Chong You, Daniel P. Robinson, René Vidal
While outlier detection methods based on robust statistics have existed for decades, only recently have methods based on sparse and low-rank representation been developed along with guarantees of correct outlier detection when the inliers lie in one or more low-dimensional subspaces.
no code implementations • 17 Oct 2016 • Chun-Guang Li, Chong You, René Vidal
In this paper, we propose a joint optimization framework --- Structured Sparse Subspace Clustering (S$^3$C) --- for learning both the affinity and the segmentation.
1 code implementation • CVPR 2016 • Chong You, Chun-Guang Li, Daniel P. Robinson, Rene Vidal
Our geometric analysis also provides a theoretical justification and a geometric interpretation for the balance between the connectedness (due to $\ell_2$ regularization) and subspace-preserving (due to $\ell_1$ regularization) properties for elastic net subspace clustering.
Ranked #7 on Image Clustering on coil-100 (Accuracy metric)
2 code implementations • CVPR 2016 • Chong You, Daniel P. Robinson, Rene Vidal
Subspace clustering methods based on $\ell_1$, $\ell_2$ or nuclear norm regularization have become very popular due to their simplicity, theoretical guarantees and empirical success.
Ranked #6 on Image Clustering on Extended Yale-B