Search Results for author: Zhihui Zhu

Found 54 papers, 23 papers with code

Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data

no code implementations22 Oct 2024 Xinyi Ling, Bo Peng, Hanwen Du, Zhihui Zhu, Xia Ning

Leveraging multimodal data to drive breakthroughs in e-commerce applications through Multimodal Foundation Models (MFMs) is gaining increasing attention from the research community.

Robust Low-rank Tensor Train Recovery

no code implementations19 Oct 2024 Zhen Qin, Zhihui Zhu

We first establish the $\ell_1/\ell_2$-restricted isometry property (RIP) for Gaussian measurement operators, demonstrating that the information in the TT format tensor can be preserved using a number of measurements that grows linearly with $N$.

On Layer-wise Representation Similarity: Application for Multi-Exit Models with a Single Classifier

no code implementations20 Jun 2024 Jiachen Jiang, Jinxin Zhou, Zhihui Zhu

Our experimental results on common transformers reveal that representations across layers are positively correlated, albeit the similarity decreases when layers are far apart.

Computational and Statistical Guarantees for Tensor-on-Tensor Regression with Tensor Train Decomposition

no code implementations10 Jun 2024 Zhen Qin, Zhihui Zhu

However, the exponential growth in tensor complexity poses challenges for storage and computation in ToT regression.

Computational Efficiency regression

A Global Geometric Analysis of Maximal Coding Rate Reduction

no code implementations4 Jun 2024 Peng Wang, Huikang Liu, Druv Pai, Yaodong Yu, Zhihui Zhu, Qing Qu, Yi Ma

The maximal coding rate reduction (MCR$^2$) objective for learning structured and compact deep representations is drawing increasing attention, especially after its recent usage in the derivation of fully explainable and highly effective deep network architectures.

Guaranteed Nonconvex Factorization Approach for Tensor Train Recovery

no code implementations5 Jan 2024 Zhen Qin, Michael B. Wakin, Zhihui Zhu

We first delve into the TT factorization problem and establish the local linear convergence of RGD.

OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators

2 code implementations15 Dec 2023 Tianyi Chen, Tianyu Ding, Zhihui Zhu, Zeyu Chen, HsiangTao Wu, Ilya Zharkov, Luming Liang

Compressing a predefined deep neural network (DNN) into a compact sub-network with competitive performance is crucial in the efficient machine learning realm.

Neural Architecture Search

The Efficiency Spectrum of Large Language Models: An Algorithmic Survey

1 code implementation1 Dec 2023 Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang

The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains, reshaping the artificial general intelligence landscape.

Model Compression Survey

DREAM: Diffusion Rectification and Estimation-Adaptive Models

1 code implementation CVPR 2024 Jinxin Zhou, Tianyu Ding, Tianyi Chen, Jiachen Jiang, Ilya Zharkov, Zhihui Zhu, Luming Liang

We present DREAM, a novel training framework representing Diffusion Rectification and Estimation Adaptive Models, requiring minimal code changes (just three lines) yet significantly enhancing the alignment of training with sampling in diffusion models.

Image Super-Resolution

Convergence Analysis for Learning Orthonormal Deep Linear Neural Networks

no code implementations24 Nov 2023 Zhen Qin, Xuwei Tan, Zhihui Zhu

Enforcing orthonormal or isometric property for the weight matrices has been shown to enhance the training of deep neural networks by mitigating gradient exploding/vanishing and increasing the robustness of the learned networks.

Understanding Deep Representation Learning via Layerwise Feature Compression and Discrimination

1 code implementation6 Nov 2023 Peng Wang, Xiao Li, Can Yaras, Zhihui Zhu, Laura Balzano, Wei Hu, Qing Qu

To the best of our knowledge, this is the first quantitative characterization of feature evolution in hierarchical representations of deep linear networks.

Feature Compression Multi-class Classification +2

Generalized Neural Collapse for a Large Number of Classes

no code implementations9 Oct 2023 Jiachen Jiang, Jinxin Zhou, Peng Wang, Qing Qu, Dustin Mixon, Chong You, Zhihui Zhu

However, most of the existing empirical and theoretical studies in neural collapse focus on the case that the number of classes is small relative to the dimension of the feature space.

Face Recognition Retrieval

The Law of Parsimony in Gradient Descent for Learning Deep Linear Networks

1 code implementation1 Jun 2023 Can Yaras, Peng Wang, Wei Hu, Zhihui Zhu, Laura Balzano, Qing Qu

Second, it allows us to better understand deep representation learning by elucidating the linear progressive separation and concentration of representations from shallow to deep layers.

Representation Learning

OTOV2: Automatic, Generic, User-Friendly

1 code implementation13 Mar 2023 Tianyi Chen, Luming Liang, Tianyu Ding, Zhihui Zhu, Ilya Zharkov

We propose the second generation of Only-Train-Once (OTOv2), which first automatically trains and compresses a general DNN only once from scratch to produce a more compact model with competitive performance without fine-tuning.

Model Compression

A Provable Splitting Approach for Symmetric Nonnegative Matrix Factorization

no code implementations25 Jan 2023 Xiao Li, Zhihui Zhu, Qiuwei Li, Kai Liu

The symmetric Nonnegative Matrix Factorization (NMF), a special but important class of the general NMF, has found numerous applications in data analysis such as various clustering tasks.

Clustering Image Clustering +1

Understanding and Improving Transfer Learning of Deep Models via Neural Collapse

no code implementations23 Dec 2022 Xiao Li, Sheng Liu, Jinxin Zhou, Xinyu Lu, Carlos Fernandez-Granda, Zhihui Zhu, Qing Qu

In particular, we discovered a systematic pattern that emerges when linear probing pre-trained models on downstream training data: the more feature collapse of pre-trained models on downstream training data, the higher the transfer accuracy.

Data Augmentation parameter-efficient fine-tuning +2

Revisiting Sparse Convolutional Model for Visual Recognition

1 code implementation24 Oct 2022 Xili Dai, Mingyang Li, Pengyuan Zhai, Shengbang Tong, Xingjian Gao, Shao-Lun Huang, Zhihui Zhu, Chong You, Yi Ma

We show that such models have equally strong empirical performance on CIFAR-10, CIFAR-100, and ImageNet datasets when compared to conventional neural networks.

Image Classification

Are All Losses Created Equal: A Neural Collapse Perspective

no code implementations4 Oct 2022 Jinxin Zhou, Chong You, Xiao Li, Kangning Liu, Sheng Liu, Qing Qu, Zhihui Zhu

We extend such results and show through global solution and landscape analyses that a broad family of loss functions including commonly used label smoothing (LS) and focal loss (FL) exhibits Neural Collapse.

A Validation Approach to Over-parameterized Matrix and Image Recovery

no code implementations21 Sep 2022 Lijun Ding, Zhen Qin, Liwei Jiang, Jinxin Zhou, Zhihui Zhu

This paper studies the problem of recovering a low-rank matrix from several noisy random linear measurements.

Image Restoration

Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold

1 code implementation19 Sep 2022 Can Yaras, Peng Wang, Zhihui Zhu, Laura Balzano, Qing Qu

When training overparameterized deep networks for classification tasks, it has been widely observed that the learned features exhibit a so-called "neural collapse" phenomenon.

Multi-class Classification Representation Learning +1

Sparsity-guided Network Design for Frame Interpolation

1 code implementation9 Sep 2022 Tianyu Ding, Luming Liang, Zhihui Zhu, Tianyi Chen, Ilya Zharkov

As a result, we achieve a considerable performance gain with a quarter of the size of the original AdaCoF.

Error Analysis of Tensor-Train Cross Approximation

no code implementations9 Jul 2022 Zhen Qin, Alexander Lidiak, Zhexuan Gong, Gongguo Tang, Michael B. Wakin, Zhihui Zhu

Tensor train decomposition is widely used in machine learning and quantum physics due to its concise representation of high-dimensional tensors, overcoming the curse of dimensionality.

On the Optimization Landscape of Neural Collapse under MSE Loss: Global Optimality with Unconstrained Features

no code implementations2 Mar 2022 Jinxin Zhou, Xiao Li, Tianyu Ding, Chong You, Qing Qu, Zhihui Zhu

When training deep neural networks for classification tasks, an intriguing empirical phenomenon has been widely observed in the last-layer classifiers and features, where (i) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and (ii) cross-example within-class variability of last-layer activations collapses to zero.

Robust Training under Label Noise by Over-parameterization

1 code implementation28 Feb 2022 Sheng Liu, Zhihui Zhu, Qing Qu, Chong You

In this work, we propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted.

Learning with noisy labels

Rank Overspecified Robust Matrix Recovery: Subgradient Method and Exact Recovery

no code implementations NeurIPS 2021 Lijun Ding, Liwei Jiang, Yudong Chen, Qing Qu, Zhihui Zhu

We study the robust recovery of a low-rank matrix from sparsely and grossly corrupted Gaussian measurements, with no prior knowledge on the intrinsic rank.

Only Train Once: A One-Shot Neural Network Training And Pruning Framework

1 code implementation NeurIPS 2021 Tianyi Chen, Bo Ji, Tianyu Ding, Biyi Fang, Guanyi Wang, Zhihui Zhu, Luming Liang, Yixin Shi, Sheng Yi, Xiao Tu

Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices.

A Geometric Analysis of Neural Collapse with Unconstrained Features

1 code implementation NeurIPS 2021 Zhihui Zhu, Tianyu Ding, Jinxin Zhou, Xiao Li, Chong You, Jeremias Sulam, Qing Qu

In contrast to existing landscape analysis for deep neural networks which is often disconnected from practice, our analysis of the simplified model not only does it explain what kind of features are learned in the last layer, but it also shows why they can be efficiently optimized in the simplified settings, matching the empirical observations in practical deep network architectures.

CDFI: Compression-Driven Network Design for Frame Interpolation

1 code implementation CVPR 2021 Tianyu Ding, Luming Liang, Zhihui Zhu, Ilya Zharkov

DNN-based frame interpolation--that generates the intermediate frames given two consecutive frames--typically relies on heavy model architectures with a huge number of features, preventing them from being deployed on systems with limited resources, e. g., mobile devices.

 Ranked #1 on Video Frame Interpolation on Middlebury (LPIPS metric)

Video Frame Interpolation

Convolutional Normalization: Improving Deep Convolutional Network Robustness and Training

1 code implementation NeurIPS 2021 Sheng Liu, Xiao Li, Yuexiang Zhai, Chong You, Zhihui Zhu, Carlos Fernandez-Granda, Qing Qu

Furthermore, we show that our ConvNorm can reduce the layerwise spectral norm of the weight matrices and hence improve the Lipschitzness of the network, leading to easier training and improved robustness for deep ConvNets.

Generative Adversarial Network

A Half-Space Stochastic Projected Gradient Method for Group Sparsity Regularization

no code implementations1 Jan 2021 Tianyi Chen, Guanyi Wang, Tianyu Ding, Bo Ji, Sheng Yi, Zhihui Zhu

Optimizing with group sparsity is significant in enhancing model interpretability in machining learning applications, e. g., feature selection, compressed sensing and model compression.

feature selection Model Compression +1

Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization

1 code implementation NeurIPS 2020 Chong You, Zhihui Zhu, Qing Qu, Yi Ma

This paper shows that with a double over-parameterization for both the low-rank matrix and sparse corruption, gradient descent with discrepant learning rates provably recovers the underlying matrix even without prior knowledge on neither rank of the matrix nor sparsity of the corruption.

Recovery and Generalization in Over-Realized Dictionary Learning

no code implementations11 Jun 2020 Jeremias Sulam, Chong You, Zhihui Zhu

We thoroughly demonstrate this observation in practice and provide an analysis of this phenomenon by tying recovery measures to generalization bounds.

Dictionary Learning Generalization Bounds

Orthant Based Proximal Stochastic Gradient Method for $\ell_1$-Regularized Optimization

1 code implementation7 Apr 2020 Tianyi Chen, Tianyu Ding, Bo Ji, Guanyi Wang, Jing Tian, Yixin Shi, Sheng Yi, Xiao Tu, Zhihui Zhu

Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression.

feature selection Model Compression

Finding the Sparsest Vectors in a Subspace: Theory, Algorithms, and Applications

no code implementations20 Jan 2020 Qing Qu, Zhihui Zhu, Xiao Li, Manolis C. Tsakiris, John Wright, René Vidal

The problem of finding the sparsest vector (direction) in a low dimensional subspace can be considered as a homogeneous variant of the sparse recovery problem, which finds applications in robust subspace recovery, dictionary learning, sparse blind deconvolution, and many other problems in signal processing and machine learning.

Dictionary Learning Representation Learning

Analysis of the Optimization Landscapes for Overcomplete Representation Learning

no code implementations5 Dec 2019 Qing Qu, Yuexiang Zhai, Xiao Li, Yuqian Zhang, Zhihui Zhu

In this work, we show these problems can be formulated as $\ell^4$-norm optimization problems with spherical constraint, and study the geometric properties of their nonconvex optimization landscapes.

Representation Learning

Viewpoint-Aware Loss with Angular Regularization for Person Re-Identification

1 code implementation3 Dec 2019 Zhihui Zhu, Xinyang Jiang, Feng Zheng, Xiaowei Guo, Feiyue Huang, Wei-Shi Zheng, Xing Sun

Instead of one subspace for each viewpoint, our method projects the feature from different viewpoints into a unified hypersphere and effectively models the feature distribution on both the identity-level and the viewpoint-level.

Ranked #7 on Person Re-Identification on DukeMTMC-reID (using extra training data)

Person Re-Identification

Distributed Low-rank Matrix Factorization With Exact Consensus

1 code implementation NeurIPS 2019 Zhihui Zhu, Qiuwei Li, Xinshuo Yang, Gongguo Tang, Michael B. Wakin

Low-rank matrix factorization is a problem of broad importance, owing to the ubiquity of low-rank models in machine learning contexts.

Weakly Convex Optimization over Stiefel Manifold Using Riemannian Subgradient-Type Methods

1 code implementation12 Nov 2019 Xiao Li, Shixiang Chen, Zengde Deng, Qing Qu, Zhihui Zhu, Anthony Man Cho So

To the best of our knowledge, these are the first convergence guarantees for using Riemannian subgradient-type methods to optimize a class of nonconvex nonsmooth functions over the Stiefel manifold.

Dictionary Learning Vocal Bursts Type Prediction

A Nonconvex Approach for Exact and Efficient Multichannel Sparse Blind Deconvolution

1 code implementation NeurIPS 2019 Qing Qu, Xiao Li, Zhihui Zhu

We study the multi-channel sparse blind deconvolution (MCS-BD) problem, whose task is to simultaneously recover a kernel $\mathbf a$ and multiple sparse inputs $\{\mathbf x_i\}_{i=1}^p$ from their circulant convolution $\mathbf y_i = \mathbf a \circledast \mathbf x_i $ ($i=1,\cdots, p$).

Computational Efficiency

Provable Bregman-divergence based Methods for Nonconvex and Non-Lipschitz Problems

no code implementations22 Apr 2019 Qiuwei Li, Zhihui Zhu, Gongguo Tang, Michael B. Wakin

Therefore, this work not only develops guaranteed optimization methods for non-Lipschitz smooth problems but also solves an open problem of showing the second-order convergence guarantees for these alternating minimization methods.

Dual Principal Component Pursuit: Probability Analysis and Efficient Algorithms

no code implementations24 Dec 2018 Zhihui Zhu, Yifan Wang, Daniel P. Robinson, Daniel Q. Naiman, Rene Vidal, Manolis C. Tsakiris

However, its geometric analysis is based on quantities that are difficult to interpret and are not amenable to statistical analysis.

Dual Principal Component Pursuit: Improved Analysis and Efficient Algorithms

no code implementations NeurIPS 2018 Zhihui Zhu, Yifan Wang, Daniel Robinson, Daniel Naiman, Rene Vidal, Manolis Tsakiris

However, its geometric analysis is based on quantities that are difficult to interpret and are not amenable to statistical analysis.

Dropping Symmetry for Fast Symmetric Nonnegative Matrix Factorization

no code implementations NeurIPS 2018 Zhihui Zhu, Xiao Li, Kai Liu, Qiuwei Li

Symmetric nonnegative matrix factorization (NMF), a special but important class of the general NMF, is demonstrated to be useful for data analysis and in particular for various clustering tasks.

Clustering Image Clustering

Global Optimality in Distributed Low-rank Matrix Factorization

no code implementations7 Nov 2018 Zhihui Zhu, Qiuwei Li, Xinshuo Yang, Gongguo Tang, Michael B. Wakin

We study the convergence of a variant of distributed gradient descent (DGD) on a distributed low-rank matrix approximation problem wherein some optimization variables are used for consensus (as in classical DGD) and some optimization variables appear only locally at a single node in the network.

Nonconvex Robust Low-rank Matrix Recovery

no code implementations24 Sep 2018 Xiao Li, Zhihui Zhu, Anthony Man-Cho So, Rene Vidal

In this paper we study the problem of recovering a low-rank matrix from a number of random linear measurements that are corrupted by outliers taking arbitrary values.

Information Theory Information Theory

The Global Optimization Geometry of Shallow Linear Neural Networks

no code implementations13 May 2018 Zhihui Zhu, Daniel Soudry, Yonina C. Eldar, Michael B. Wakin

We examine the squared error loss landscape of shallow linear neural networks.

Optimized Structured Sparse Sensing Matrices for Compressive Sensing

no code implementations19 Sep 2017 Tao Hong, Xiao Li, Zhihui Zhu, Qiuwei Li

We consider designing a robust structured sparse sensing matrix consisting of a sparse matrix with a few non-zero entries per row and a dense base matrix for capturing signals efficiently We design the robust structured sparse sensing matrix through minimizing the distance between the Gram matrix of the equivalent dictionary and the target Gram of matrix holding small mutual coherence.

Compressive Sensing Image Compression

Geometry of Factored Nuclear Norm Regularization

no code implementations5 Apr 2017 Qiuwei Li, Zhihui Zhu, Gongguo Tang

In spite of the nonconvexity of the factored formulation, we prove that when the convex loss function $f(X)$ is $(2r, 4r)$-restricted well-conditioned, each critical point of the factored problem either corresponds to the optimal solution $X^\star$ of the original convex optimization or is a strict saddle point where the Hessian matrix has a strictly negative eigenvalue.

Online Learning Sensing Matrix and Sparsifying Dictionary Simultaneously for Compressive Sensing

1 code implementation4 Jan 2017 Tao Hong, Zhihui Zhu

The simulation results on natural images demonstrate the effectiveness of the suggested online algorithm compared with the existing methods.

Compressive Sensing

An Efficient Method for Robust Projection Matrix Design

1 code implementation27 Sep 2016 Tao Hong, Zhihui Zhu

Without requiring of training data, we can efficiently design the robust projection matrix and apply it for most of CS systems, like a CS system for image processing with a conventional wavelet dictionary in which the SRE matrix is generally not available.

Compressive Sensing

Cannot find the paper you are looking for? You can Submit a new open access paper.