Search Results for author: Huishuai Zhang

Found 55 papers, 17 papers with code

©Plug-in Authorization for Human Content Copyright Protection in Text-to-Image Model

no code implementations • 18 Apr 2024 • Chao Zhou, Huishuai Zhang, Jiang Bian, Weiming Zhang, Nenghai Yu

To mitigate this, we propose the \copyright Plug-in Authorization framework, introducing three operations: addition, extraction, and combination.

Paper
Add Code

On the Convergence of Adam under Non-uniform Smoothness: Separability from SGDM and Beyond

no code implementations • 22 Mar 2024 • Bohan Wang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Wei Chen

This paper aims to clearly distinguish between Stochastic Gradient Descent with Momentum (SGDM) and Adam in terms of their convergence rates.

Paper
Add Code

Differentially Private Synthetic Data via Foundation Model APIs 2: Text

1 code implementation • 4 Mar 2024 • Chulin Xie, Zinan Lin, Arturs Backurs, Sivakanth Gopi, Da Yu, Huseyin A Inan, Harsha Nori, Haotian Jiang, Huishuai Zhang, Yin Tat Lee, Bo Li, Sergey Yekhanin

Lin et al. (2024) recently introduced the Private Evolution (PE) algorithm to generate DP synthetic images with only API access to diffusion models.

Privacy Preserving

Paper
Code

Exploring Transferability for Randomized Smoothing

no code implementations • 14 Dec 2023 • Kai Qiu, Huishuai Zhang, Zhirong Wu, Stephen Lin

However, the model robustness, which is a critical aspect for safety, is often optimized for each specific task rather than at the pretraining stage.

Paper
Add Code

Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study

no code implementations • 25 Nov 2023 • Prin Phunyaphibarn, Junghyun Lee, Bohan Wang, Huishuai Zhang, Chulhee Yun

Although gradient descent with momentum is widely used in modern deep learning, a concrete understanding of its effects on the training trajectory still remains elusive.

Paper
Add Code

On the Generalization Properties of Diffusion Models

1 code implementation • NeurIPS 2023 • Puheng Li, Zhong Li, Huishuai Zhang, Jiang Bian

This precisely elucidates the adverse effect of "modes shift" in ground truths on the model generalization.

Paper
Code

Closing the Gap Between the Upper Bound and the Lower Bound of Adam's Iteration Complexity

no code implementations • 27 Oct 2023 • Bohan Wang, Jingwen Fu, Huishuai Zhang, Nanning Zheng, Wei Chen

Recently, Arjevani et al. [1] established a lower bound of iteration complexity for the first-order optimization under an $L$-smooth condition and a bounded noise variance assumption.

LEMMA valid

Paper
Add Code

FILM: How can Few-Shot Image Classification Benefit from Pre-Trained Language Models?

no code implementations • 9 Jul 2023 • Zihao Jiang, Yunkai Dang, Dong Pang, Huishuai Zhang, Weiran Huang

Few-shot learning aims to train models that can be generalized to novel classes with only a few samples.

Contrastive Learning Few-Shot Image Classification +2

Paper
Add Code

When and Why Momentum Accelerates SGD:An Empirical Study

no code implementations • 15 Jun 2023 • Jingwen Fu, Bohan Wang, Huishuai Zhang, Zhizheng Zhang, Wei Chen, Nanning Zheng

In the comparison of SGDM and SGD with the same effective learning rate and the same batch size, we observe a consistent pattern: when $\eta_{ef}$ is small, SGDM and SGD experience almost the same empirical training losses; when $\eta_{ef}$ surpasses a certain threshold, SGDM begins to perform better than SGD.

Paper
Add Code

UADB: Unsupervised Anomaly Detection Booster

1 code implementation • 3 Jun 2023 • Hangting Ye, Zhining Liu, Xinyi Shen, Wei Cao, Shun Zheng, Xiaofan Gui, Huishuai Zhang, Yi Chang, Jiang Bian

This is a challenging task given the heterogeneous model structures and assumptions adopted by existing UAD methods.

Unsupervised Anomaly Detection

Paper
Code

Convergence of AdaGrad for Non-convex Objectives: Simple Proofs and Relaxed Assumptions

no code implementations • 29 May 2023 • Bohan Wang, Huishuai Zhang, Zhi-Ming Ma, Wei Chen

We provide a simple convergence proof for AdaGrad optimizing non-convex objectives under only affine noise variance and bounded smoothness assumptions.

Paper
Add Code

Selective Pre-training for Private Fine-tuning

1 code implementation • 23 May 2023 • Da Yu, Sivakanth Gopi, Janardhan Kulkarni, Zinan Lin, Saurabh Naik, Tomasz Lukasz Religa, Jian Yin, Huishuai Zhang

Besides performance improvements, our framework also shows that with careful pre-training and private fine-tuning, smaller models can match the performance of much larger models that do not have access to private data, highlighting the promise of private learning as a tool for model compression and efficiency.

Model Compression Transfer Learning

Paper
Code

ResiDual: Transformer with Dual Residual Connections

1 code implementation • 28 Apr 2023 • Shufang Xie, Huishuai Zhang, Junliang Guo, Xu Tan, Jiang Bian, Hany Hassan Awadalla, Arul Menezes, Tao Qin, Rui Yan

In this paper, we propose ResiDual, a novel Transformer architecture with Pre-Post-LN (PPLN), which fuses the connections in Post-LN and Pre-LN together and inherits their advantages while avoids their limitations.

Machine Translation

Paper
Code

Exploring the Limits of Differentially Private Deep Learning with Group-wise Clipping

no code implementations • 3 Dec 2022 • Jiyan He, Xuechen Li, Da Yu, Huishuai Zhang, Janardhan Kulkarni, Yin Tat Lee, Arturs Backurs, Nenghai Yu, Jiang Bian

To reduce the compute time overhead of private learning, we show that \emph{per-layer clipping}, where the gradient of each neural network layer is clipped separately, allows clipping to be performed in conjunction with backpropagation in differentially private optimization.

Computational Efficiency

Paper
Add Code

Denoising Masked AutoEncoders Help Robust Classification

1 code implementation • 10 Oct 2022 • Quanlin Wu, Hang Ye, Yuntian Gu, Huishuai Zhang, LiWei Wang, Di He

In this paper, we propose a new self-supervised method, which is called Denoising Masked AutoEncoders (DMAE), for learning certified robust classifiers of images.

Classification Denoising +1

Paper
Code

Provable Adaptivity in Adam

no code implementations • 21 Aug 2022 • Bohan Wang, Yushun Zhang, Huishuai Zhang, Qi Meng, Zhi-Ming Ma, Tie-Yan Liu, Wei Chen

In particular, the existing analysis of Adam cannot clearly demonstrate the advantage of Adam over SGD.

Attribute

Paper
Add Code

Normalized/Clipped SGD with Perturbation for Differentially Private Non-Convex Optimization

no code implementations • 27 Jun 2022 • Xiaodong Yang, Huishuai Zhang, Wei Chen, Tie-Yan Liu

By ensuring differential privacy in the learning algorithms, one can rigorously mitigate the risk of large models memorizing sensitive training data.

Paper
Add Code

Adversarial Noises Are Linearly Separable for (Nearly) Random Neural Networks

no code implementations • 9 Jun 2022 • Huishuai Zhang, Da Yu, Yiping Lu, Di He

Adversarial examples, which are usually generated for specific inputs with a specific model, are ubiquitous for neural networks.

Paper
Add Code

Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent

1 code implementation • 6 Jun 2022 • Da Yu, Gautam Kamath, Janardhan Kulkarni, Tie-Yan Liu, Jian Yin, Huishuai Zhang

Differentially private stochastic gradient descent (DP-SGD) is the workhorse algorithm for recent advances in private deep learning.

Paper
Code

Robust Quantity-Aware Aggregation for Federated Learning

no code implementations • 22 May 2022 • Jingwei Yi, Fangzhao Wu, Huishuai Zhang, Bin Zhu, Tao Qi, Guangzhong Sun, Xing Xie

Federated learning (FL) enables multiple clients to collaboratively train models without sharing their local data, and becomes an important privacy-preserving machine learning framework.

Federated Learning Privacy Preserving

Paper
Add Code

Availability Attacks Create Shortcuts

1 code implementation • 1 Nov 2021 • Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu

We are the first to unveil an important population property of the perturbations of these attacks: they are almost \textbf{linearly separable} when assigned with the target labels of the corresponding samples, which hence can work as \emph{shortcuts} for the learning objective.

Data Poisoning

Paper
Code

Optimizing Information-theoretical Generalization Bounds via Anisotropic Noise in SGLD

no code implementations • NeurIPS 2021 • Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu

We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized.

Generalization Bounds

Paper
Add Code

Differentially Private Fine-tuning of Language Models

2 code implementations • ICLR 2022 • Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A. Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, Huishuai Zhang

For example, on the MNLI dataset we achieve an accuracy of $87. 8\%$ using RoBERTa-Large and $83. 5\%$ using RoBERTa-Base with a privacy budget of $\epsilon = 6. 7$.

Text Generation

Paper
Code

Does Momentum Change the Implicit Regularization on Separable Data?

no code implementations • 8 Oct 2021 • Bohan Wang, Qi Meng, Huishuai Zhang, Ruoyu Sun, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

The momentum acceleration technique is widely adopted in many optimization algorithms.

Paper
Add Code

Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum

no code implementations • 29 Sep 2021 • Zeke Xie, Xinrui Wang, Huishuai Zhang, Issei Sato, Masashi Sugiyama

Specifically, we disentangle the effects of Adaptive Learning Rate and Momentum of the Adam dynamics on saddle-point escaping and flat minima selection.

Paper
Add Code

Regularized-OFU: an efficient algorithm for general contextual bandit with optimization oracles

no code implementations • 29 Sep 2021 • Yichi Zhou, Shihong Song, Huishuai Zhang, Jun Zhu, Wei Chen, Tie-Yan Liu

In contextual bandit, one major challenge is to develop theoretically solid and empirically efficient algorithms for general function classes.

Multi-Armed Bandits Thompson Sampling

Paper
Add Code

Regularized OFU: an Efficient UCB Estimator forNon-linear Contextual Bandit

no code implementations • 29 Jun 2021 • Yichi Zhou, Shihong Song, Huishuai Zhang, Jun Zhu, Wei Chen, Tie-Yan Liu

However, it is in general unknown how to deriveefficient and effective EE trade-off methods for non-linearcomplex tasks, suchas contextual bandit with deep neural network as the reward function.

Multi-Armed Bandits

Paper
Add Code

Large Scale Private Learning via Low-rank Reparametrization

1 code implementation • 17 Jun 2021 • Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu

We propose a reparametrization scheme to address the challenges of applying differentially private SGD on large neural networks, which are 1) the huge memory cost of storing individual gradients, 2) the added noise suffering notorious dimensional dependence.

Paper
Code

Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart

1 code implementation • CVPR 2022 • Tianyu Pang, Huishuai Zhang, Di He, Yinpeng Dong, Hang Su, Wei Chen, Jun Zhu, Tie-Yan Liu

Along with this routine, we find that confidence and a rectified confidence (R-Con) can form two coupled rejection metrics, which could provably distinguish wrongly classified inputs from correctly classified ones.

Vocal Bursts Valence Prediction

Paper
Code

Optimizing Information-theoretical Generalization Bound via Anisotropic Noise of SGLD

no code implementations • NeurIPS 2021 • Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu

Generalization Bounds

Paper
Add Code

Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning

2 code implementations • ICLR 2021 • Da Yu, Huishuai Zhang, Wei Chen, Tie-Yan Liu

The privacy leakage of the model about the training data can be bounded in the differential privacy mechanism.

Paper
Code

BN-invariant sharpness regularizes the training model to better generalization

no code implementations • 8 Jan 2021 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

However, it has been pointed out that the usual definitions of sharpness, which consider either the maxima or the integral of loss over a $\delta$ ball of parameters around minima, cannot give consistent measurement for scale invariant neural networks, e. g., networks with batch normalization layer.

Paper
Add Code

On the Stability of Multi-branch Network

no code implementations • 1 Jan 2021 • Huishuai Zhang, Da Yu, Wei Chen, Tie-Yan Liu

More importantly, we propose a new design ``STAM aggregation" that can guarantee to STAbilize the forward/backward process of Multi-branch networks irrespective of the number of branches.

Paper
Add Code

How Does Data Augmentation Affect Privacy in Machine Learning?

1 code implementation • 21 Jul 2020 • Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu

Even further, we show that the proposed approach can achieve higher MI attack success rates on models trained with some data augmentation than the existing methods on models trained without data augmentation.

BIG-bench Machine Learning Data Augmentation

Paper
Code

Adai: Separating the Effects of Adaptive Learning Rate and Momentum Inertia

1 code implementation • 29 Jun 2020 • Zeke Xie, Xinrui Wang, Huishuai Zhang, Issei Sato, Masashi Sugiyama

Specifically, we disentangle the effects of Adaptive Learning Rate and Momentum of the Adam dynamics on saddle-point escaping and minima selection.

130

Paper
Code

On Layer Normalization in the Transformer Architecture

8 code implementations • ICML 2020 • Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Li-Wei Wang, Tie-Yan Liu

This motivates us to remove the warm-up stage for the training of Pre-LN Transformers.

7,522

Paper
Code

Gradient Perturbation is Underrated for Differentially Private Convex Optimization

no code implementations • 26 Nov 2019 • Da Yu, Huishuai Zhang, Wei Chen, Tie-Yan Liu, Jian Yin

By using the \emph{expected curvature}, we show that gradient perturbation can achieve a significantly improved utility guarantee that can theoretically justify the advantage of gradient perturbation over other perturbation methods.

Paper
Add Code

STABILITY AND CONVERGENCE THEORY FOR LEARNING RESNET: A FULL CHARACTERIZATION

no code implementations • 25 Sep 2019 • Huishuai Zhang, Da Yu, Mingyang Yi, Wei Chen, Tie-Yan Liu

We show that for standard initialization used in practice, $\tau =1/\Omega(\sqrt{L})$ is a sharp value in characterizing the stability of forward/backward process of ResNet, where $L$ is the number of residual blocks.

Paper
Add Code

THE EFFECT OF ADVERSARIAL TRAINING: A THEORETICAL CHARACTERIZATION

no code implementations • 25 Sep 2019 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

It has widely shown that adversarial training (Madry et al., 2018) is effective in defending adversarial attack empirically.

Adversarial Attack

Paper
Add Code

Convergence of Distributed Stochastic Variance Reduced Methods without Sampling Extra Data

no code implementations • 29 May 2019 • Shicong Cen, Huishuai Zhang, Yuejie Chi, Wei Chen, Tie-Yan Liu

Our theory captures how the convergence of distributed algorithms behaves as the number of machines and the size of local data vary.

Paper
Add Code

Optimization on Multiple Manifolds

no code implementations • ICLR 2019 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Optimization on manifold has been widely used in machine learning, to handle optimization problems with constraint.

Paper
Add Code

G-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

no code implementations • ICLR 2019 • Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Then, a natural question is: \emph{can we construct a new vector space that is positively scale-invariant and sufficient to represent ReLU neural networks so as to better facilitate the optimization process }?

Paper
Add Code

Stabilize Deep ResNet with A Sharp Scaling Factor $τ$

1 code implementation • 17 Mar 2019 • Huishuai Zhang, Da Yu, Mingyang Yi, Wei Chen, Tie-Yan Liu

Moreover, for ResNets with normalization layer, adding such a factor $\tau$ also stabilizes the training and obtains significant performance gain for deep ResNet.

Paper
Code

SGD Converges to Global Minimum in Deep Learning via Star-convex Path

no code implementations • ICLR 2019 • Yi Zhou, Junjie Yang, Huishuai Zhang, Yingbin Liang, Vahid Tarokh

Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a variety of deep neural networks.

Paper
Add Code

On the Local Hessian in Back-propagation

no code implementations • NeurIPS 2018 • Huishuai Zhang, Wei Chen, Tie-Yan Liu

We study the Hessian of the local back-matching loss (local Hessian) and connect it to the efficiency of BP.

Paper
Add Code

Capacity Control of ReLU Neural Networks by Basis-path Norm

no code implementations • 19 Sep 2018 • Shuxin Zheng, Qi Meng, Huishuai Zhang, Wei Chen, Nenghai Yu, Tie-Yan Liu

Motivated by this, we propose a new norm \emph{Basis-path Norm} based on a group of linearly independent paths to measure the capacity of neural networks more accurately.

Paper
Add Code

Train Feedfoward Neural Network with Layer-wise Adaptive Rate via Approximating Back-matching Propagation

no code implementations • 27 Feb 2018 • Huishuai Zhang, Wei Chen, Tie-Yan Liu

This inconsistence of gradient magnitude across different layers renders optimization of deep neural network with a single learning rate problematic.

Paper
Add Code

Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization

no code implementations • 19 Feb 2018 • Yi Zhou, Yingbin Liang, Huishuai Zhang

With strongly convex regularizers, we further establish the generalization error bounds for nonconvex loss functions under proximal SGD with high-probability guarantee, i. e., exponential concentration in probability.

Paper
Add Code

$\mathcal{G}$-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

no code implementations • 11 Feb 2018 • Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Paper
Add Code

Block-diagonal Hessian-free Optimization for Training Neural Networks

no code implementations • ICLR 2018 • Huishuai Zhang, Caiming Xiong, James Bradbury, Richard Socher

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence.

Second-order methods

Paper
Add Code

Nonconvex Low-Rank Matrix Recovery with Arbitrary Outliers via Median-Truncated Gradient Descent

no code implementations • 23 Sep 2017 • Yuanxin Li, Yuejie Chi, Huishuai Zhang, Yingbin Liang

Recent work has demonstrated the effectiveness of gradient descent for directly recovering the factors of low-rank matrices from random linear measurements in a globally convergent manner when initialized properly.

Paper
Add Code

Reshaped Wirtinger Flow for Solving Quadratic System of Equations

no code implementations • NeurIPS 2016 • Huishuai Zhang, Yingbin Liang

In contrast to the smooth loss function used in WF, we adopt a nonsmooth but lower-order loss function, and design a gradient-like algorithm (referred to as reshaped-WF).

Paper
Add Code

Reshaped Wirtinger Flow and Incremental Algorithm for Solving Quadratic System of Equations

1 code implementation • 25 May 2016 • Huishuai Zhang, Yi Zhou, Yingbin Liang, Yuejie Chi

We further develop the incremental (stochastic) reshaped Wirtinger flow (IRWF) and show that IRWF converges linearly to the true signal.

Retrieval

Paper
Code

Median-Truncated Nonconvex Approach for Phase Retrieval with Outliers

no code implementations • 11 Mar 2016 • Huishuai Zhang, Yuejie Chi, Yingbin Liang

This paper investigates the phase retrieval problem, which aims to recover a signal from the magnitudes of its linear measurements.

Retrieval

Paper
Add Code

Analysis of Robust PCA via Local Incoherence

no code implementations • NeurIPS 2015 • Huishuai Zhang, Yi Zhou, Yingbin Liang

We investigate the robust PCA problem of decomposing an observed matrix into the sum of a low-rank and a sparse error matrices via convex programming Principal Component Pursuit (PCP).

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.