no code implementations • 22 Oct 2024 • Xinyi Ling, Bo Peng, Hanwen Du, Zhihui Zhu, Xia Ning
Leveraging multimodal data to drive breakthroughs in e-commerce applications through Multimodal Foundation Models (MFMs) is gaining increasing attention from the research community.
no code implementations • 19 Oct 2024 • Zhen Qin, Zhihui Zhu
We first establish the $\ell_1/\ell_2$-restricted isometry property (RIP) for Gaussian measurement operators, demonstrating that the information in the TT format tensor can be preserved using a number of measurements that grows linearly with $N$.
no code implementations • 20 Jun 2024 • Jiachen Jiang, Jinxin Zhou, Zhihui Zhu
Our experimental results on common transformers reveal that representations across layers are positively correlated, albeit the similarity decreases when layers are far apart.
no code implementations • 10 Jun 2024 • Zhen Qin, Zhihui Zhu
However, the exponential growth in tensor complexity poses challenges for storage and computation in ToT regression.
no code implementations • 4 Jun 2024 • Peng Wang, Huikang Liu, Druv Pai, Yaodong Yu, Zhihui Zhu, Qing Qu, Yi Ma
The maximal coding rate reduction (MCR$^2$) objective for learning structured and compact deep representations is drawing increasing attention, especially after its recent usage in the derivation of fully explainable and highly effective deep network architectures.
1 code implementation • 12 Apr 2024 • Tianyu Ding, Jinxin Zhou, Tianyi Chen, Zhihui Zhu, Ilya Zharkov, Luming Liang
Existing angle-based contour descriptors suffer from lossy representation for non-starconvex shapes.
no code implementations • 11 Jan 2024 • Xi Chen, Zhihui Zhu, Andrew Perrault
We study reinforcement learning in the presence of an unknown reward perturbation.
no code implementations • 5 Jan 2024 • Zhen Qin, Michael B. Wakin, Zhihui Zhu
We first delve into the TT factorization problem and establish the local linear convergence of RGD.
2 code implementations • 15 Dec 2023 • Tianyi Chen, Tianyu Ding, Zhihui Zhu, Zeyu Chen, HsiangTao Wu, Ilya Zharkov, Luming Liang
Compressing a predefined deep neural network (DNN) into a compact sub-network with competitive performance is crucial in the efficient machine learning realm.
1 code implementation • 1 Dec 2023 • Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains, reshaping the artificial general intelligence landscape.
1 code implementation • CVPR 2024 • Jinxin Zhou, Tianyu Ding, Tianyi Chen, Jiachen Jiang, Ilya Zharkov, Zhihui Zhu, Luming Liang
We present DREAM, a novel training framework representing Diffusion Rectification and Estimation Adaptive Models, requiring minimal code changes (just three lines) yet significantly enhancing the alignment of training with sampling in diffusion models.
no code implementations • 24 Nov 2023 • Zhen Qin, Xuwei Tan, Zhihui Zhu
Enforcing orthonormal or isometric property for the weight matrices has been shown to enhance the training of deep neural networks by mitigating gradient exploding/vanishing and increasing the robustness of the learned networks.
1 code implementation • 6 Nov 2023 • Peng Wang, Xiao Li, Can Yaras, Zhihui Zhu, Laura Balzano, Wei Hu, Qing Qu
To the best of our knowledge, this is the first quantitative characterization of feature evolution in hierarchical representations of deep linear networks.
no code implementations • 9 Oct 2023 • Jiachen Jiang, Jinxin Zhou, Peng Wang, Qing Qu, Dustin Mixon, Chong You, Zhihui Zhu
However, most of the existing empirical and theoretical studies in neural collapse focus on the case that the number of classes is small relative to the dimension of the feature space.
1 code implementation • 1 Jun 2023 • Can Yaras, Peng Wang, Wei Hu, Zhihui Zhu, Laura Balzano, Qing Qu
Second, it allows us to better understand deep representation learning by elucidating the linear progressive separation and concentration of representations from shallow to deep layers.
1 code implementation • 13 Mar 2023 • Tianyi Chen, Luming Liang, Tianyu Ding, Zhihui Zhu, Ilya Zharkov
We propose the second generation of Only-Train-Once (OTOv2), which first automatically trains and compresses a general DNN only once from scratch to produce a more compact model with competitive performance without fine-tuning.
no code implementations • 25 Jan 2023 • Xiao Li, Zhihui Zhu, Qiuwei Li, Kai Liu
The symmetric Nonnegative Matrix Factorization (NMF), a special but important class of the general NMF, has found numerous applications in data analysis such as various clustering tasks.
no code implementations • 23 Dec 2022 • Xiao Li, Sheng Liu, Jinxin Zhou, Xinyu Lu, Carlos Fernandez-Granda, Zhihui Zhu, Qing Qu
In particular, we discovered a systematic pattern that emerges when linear probing pre-trained models on downstream training data: the more feature collapse of pre-trained models on downstream training data, the higher the transfer accuracy.
1 code implementation • 24 Oct 2022 • Xili Dai, Mingyang Li, Pengyuan Zhai, Shengbang Tong, Xingjian Gao, Shao-Lun Huang, Zhihui Zhu, Chong You, Yi Ma
We show that such models have equally strong empirical performance on CIFAR-10, CIFAR-100, and ImageNet datasets when compared to conventional neural networks.
no code implementations • 4 Oct 2022 • Jinxin Zhou, Chong You, Xiao Li, Kangning Liu, Sheng Liu, Qing Qu, Zhihui Zhu
We extend such results and show through global solution and landscape analyses that a broad family of loss functions including commonly used label smoothing (LS) and focal loss (FL) exhibits Neural Collapse.
no code implementations • 21 Sep 2022 • Lijun Ding, Zhen Qin, Liwei Jiang, Jinxin Zhou, Zhihui Zhu
This paper studies the problem of recovering a low-rank matrix from several noisy random linear measurements.
1 code implementation • 19 Sep 2022 • Can Yaras, Peng Wang, Zhihui Zhu, Laura Balzano, Qing Qu
When training overparameterized deep networks for classification tasks, it has been widely observed that the learned features exhibit a so-called "neural collapse" phenomenon.
1 code implementation • 9 Sep 2022 • Tianyu Ding, Luming Liang, Zhihui Zhu, Tianyi Chen, Ilya Zharkov
As a result, we achieve a considerable performance gain with a quarter of the size of the original AdaCoF.
no code implementations • 9 Jul 2022 • Zhen Qin, Alexander Lidiak, Zhexuan Gong, Gongguo Tang, Michael B. Wakin, Zhihui Zhu
Tensor train decomposition is widely used in machine learning and quantum physics due to its concise representation of high-dimensional tensors, overcoming the curse of dimensionality.
no code implementations • 2 Mar 2022 • Jinxin Zhou, Xiao Li, Tianyu Ding, Chong You, Qing Qu, Zhihui Zhu
When training deep neural networks for classification tasks, an intriguing empirical phenomenon has been widely observed in the last-layer classifiers and features, where (i) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and (ii) cross-example within-class variability of last-layer activations collapses to zero.
1 code implementation • 28 Feb 2022 • Sheng Liu, Zhihui Zhu, Qing Qu, Chong You
In this work, we propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted.
Ranked #1 on Learning with noisy labels on CIFAR-10N-Random3
no code implementations • NeurIPS 2021 • Lijun Ding, Liwei Jiang, Yudong Chen, Qing Qu, Zhihui Zhu
We study the robust recovery of a low-rank matrix from sparsely and grossly corrupted Gaussian measurements, with no prior knowledge on the intrinsic rank.
1 code implementation • NeurIPS 2021 • Tianyi Chen, Bo Ji, Tianyu Ding, Biyi Fang, Guanyi Wang, Zhihui Zhu, Luming Liang, Yixin Shi, Sheng Yi, Xiao Tu
Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices.
1 code implementation • NeurIPS 2021 • Zhihui Zhu, Tianyu Ding, Jinxin Zhou, Xiao Li, Chong You, Jeremias Sulam, Qing Qu
In contrast to existing landscape analysis for deep neural networks which is often disconnected from practice, our analysis of the simplified model not only does it explain what kind of features are learned in the last layer, but it also shows why they can be efficiently optimized in the simplified settings, matching the empirical observations in practical deep network architectures.
1 code implementation • CVPR 2021 • Tianyu Ding, Luming Liang, Zhihui Zhu, Ilya Zharkov
DNN-based frame interpolation--that generates the intermediate frames given two consecutive frames--typically relies on heavy model architectures with a huge number of features, preventing them from being deployed on systems with limited resources, e. g., mobile devices.
Ranked #1 on Video Frame Interpolation on Middlebury (LPIPS metric)
1 code implementation • NeurIPS 2021 • Sheng Liu, Xiao Li, Yuexiang Zhai, Chong You, Zhihui Zhu, Carlos Fernandez-Granda, Qing Qu
Furthermore, we show that our ConvNorm can reduce the layerwise spectral norm of the weight matrices and hence improve the Lipschitzness of the network, leading to easier training and improved robustness for deep ConvNets.
no code implementations • 1 Jan 2021 • Tianyi Chen, Guanyi Wang, Tianyu Ding, Bo Ji, Sheng Yi, Zhihui Zhu
Optimizing with group sparsity is significant in enhancing model interpretability in machining learning applications, e. g., feature selection, compressed sensing and model compression.
1 code implementation • NeurIPS 2020 • Chong You, Zhihui Zhu, Qing Qu, Yi Ma
This paper shows that with a double over-parameterization for both the low-rank matrix and sparse corruption, gradient descent with discrepant learning rates provably recovers the underlying matrix even without prior knowledge on neither rank of the matrix nor sparsity of the corruption.
no code implementations • 11 Jun 2020 • Jeremias Sulam, Chong You, Zhihui Zhu
We thoroughly demonstrate this observation in practice and provide an analysis of this phenomenon by tying recovery measures to generalization bounds.
no code implementations • ICLR 2020 • Qing Qu, Yuexiang Zhai, Xiao Li, Yuqian Zhang, Zhihui Zhu
Learning overcomplete representations finds many applications in machine learning and data analytics.
1 code implementation • 7 Apr 2020 • Tianyi Chen, Tianyu Ding, Bo Ji, Guanyi Wang, Jing Tian, Yixin Shi, Sheng Yi, Xiao Tu, Zhihui Zhu
Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression.
no code implementations • 20 Jan 2020 • Qing Qu, Zhihui Zhu, Xiao Li, Manolis C. Tsakiris, John Wright, René Vidal
The problem of finding the sparsest vector (direction) in a low dimensional subspace can be considered as a homogeneous variant of the sparse recovery problem, which finds applications in robust subspace recovery, dictionary learning, sparse blind deconvolution, and many other problems in signal processing and machine learning.
no code implementations • 5 Dec 2019 • Qing Qu, Yuexiang Zhai, Xiao Li, Yuqian Zhang, Zhihui Zhu
In this work, we show these problems can be formulated as $\ell^4$-norm optimization problems with spherical constraint, and study the geometric properties of their nonconvex optimization landscapes.
1 code implementation • 3 Dec 2019 • Zhihui Zhu, Xinyang Jiang, Feng Zheng, Xiaowei Guo, Feiyue Huang, Wei-Shi Zheng, Xing Sun
Instead of one subspace for each viewpoint, our method projects the feature from different viewpoints into a unified hypersphere and effectively models the feature distribution on both the identity-level and the viewpoint-level.
Ranked #7 on Person Re-Identification on DukeMTMC-reID (using extra training data)
no code implementations • NeurIPS 2019 • Zhihui Zhu, Tianyu Ding, Daniel Robinson, Manolis Tsakiris, René Vidal
Minimizing a non-smooth function over the Grassmannian appears in many applications in machine learning.
1 code implementation • NeurIPS 2019 • Zhihui Zhu, Qiuwei Li, Xinshuo Yang, Gongguo Tang, Michael B. Wakin
Low-rank matrix factorization is a problem of broad importance, owing to the ubiquity of low-rank models in machine learning contexts.
1 code implementation • 12 Nov 2019 • Xiao Li, Shixiang Chen, Zengde Deng, Qing Qu, Zhihui Zhu, Anthony Man Cho So
To the best of our knowledge, these are the first convergence guarantees for using Riemannian subgradient-type methods to optimize a class of nonconvex nonsmooth functions over the Stiefel manifold.
1 code implementation • NeurIPS 2019 • Qing Qu, Xiao Li, Zhihui Zhu
We study the multi-channel sparse blind deconvolution (MCS-BD) problem, whose task is to simultaneously recover a kernel $\mathbf a$ and multiple sparse inputs $\{\mathbf x_i\}_{i=1}^p$ from their circulant convolution $\mathbf y_i = \mathbf a \circledast \mathbf x_i $ ($i=1,\cdots, p$).
no code implementations • 22 Apr 2019 • Qiuwei Li, Zhihui Zhu, Gongguo Tang, Michael B. Wakin
Therefore, this work not only develops guaranteed optimization methods for non-Lipschitz smooth problems but also solves an open problem of showing the second-order convergence guarantees for these alternating minimization methods.
no code implementations • 24 Dec 2018 • Zhihui Zhu, Yifan Wang, Daniel P. Robinson, Daniel Q. Naiman, Rene Vidal, Manolis C. Tsakiris
However, its geometric analysis is based on quantities that are difficult to interpret and are not amenable to statistical analysis.
no code implementations • NeurIPS 2018 • Zhihui Zhu, Yifan Wang, Daniel Robinson, Daniel Naiman, Rene Vidal, Manolis Tsakiris
However, its geometric analysis is based on quantities that are difficult to interpret and are not amenable to statistical analysis.
no code implementations • NeurIPS 2018 • Zhihui Zhu, Xiao Li, Kai Liu, Qiuwei Li
Symmetric nonnegative matrix factorization (NMF), a special but important class of the general NMF, is demonstrated to be useful for data analysis and in particular for various clustering tasks.
no code implementations • 7 Nov 2018 • Zhihui Zhu, Qiuwei Li, Xinshuo Yang, Gongguo Tang, Michael B. Wakin
We study the convergence of a variant of distributed gradient descent (DGD) on a distributed low-rank matrix approximation problem wherein some optimization variables are used for consensus (as in classical DGD) and some optimization variables appear only locally at a single node in the network.
no code implementations • 24 Sep 2018 • Xiao Li, Zhihui Zhu, Anthony Man-Cho So, Rene Vidal
In this paper we study the problem of recovering a low-rank matrix from a number of random linear measurements that are corrupted by outliers taking arbitrary values.
Information Theory Information Theory
no code implementations • 13 May 2018 • Zhihui Zhu, Daniel Soudry, Yonina C. Eldar, Michael B. Wakin
We examine the squared error loss landscape of shallow linear neural networks.
no code implementations • 19 Sep 2017 • Tao Hong, Xiao Li, Zhihui Zhu, Qiuwei Li
We consider designing a robust structured sparse sensing matrix consisting of a sparse matrix with a few non-zero entries per row and a dense base matrix for capturing signals efficiently We design the robust structured sparse sensing matrix through minimizing the distance between the Gram matrix of the equivalent dictionary and the target Gram of matrix holding small mutual coherence.
no code implementations • 5 Apr 2017 • Qiuwei Li, Zhihui Zhu, Gongguo Tang
In spite of the nonconvexity of the factored formulation, we prove that when the convex loss function $f(X)$ is $(2r, 4r)$-restricted well-conditioned, each critical point of the factored problem either corresponds to the optimal solution $X^\star$ of the original convex optimization or is a strict saddle point where the Hessian matrix has a strictly negative eigenvalue.
1 code implementation • 4 Jan 2017 • Tao Hong, Zhihui Zhu
The simulation results on natural images demonstrate the effectiveness of the suggested online algorithm compared with the existing methods.
1 code implementation • 27 Sep 2016 • Tao Hong, Zhihui Zhu
Without requiring of training data, we can efficiently design the robust projection matrix and apply it for most of CS systems, like a CS system for image processing with a conventional wavelet dictionary in which the SRE matrix is generally not available.