Search Results for author: Huishuai Zhang

Found 55 papers, 17 papers with code

©Plug-in Authorization for Human Content Copyright Protection in Text-to-Image Model

no code implementations18 Apr 2024 Chao Zhou, Huishuai Zhang, Jiang Bian, Weiming Zhang, Nenghai Yu

To mitigate this, we propose the \copyright Plug-in Authorization framework, introducing three operations: addition, extraction, and combination.

On the Convergence of Adam under Non-uniform Smoothness: Separability from SGDM and Beyond

no code implementations22 Mar 2024 Bohan Wang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Wei Chen

This paper aims to clearly distinguish between Stochastic Gradient Descent with Momentum (SGDM) and Adam in terms of their convergence rates.

Differentially Private Synthetic Data via Foundation Model APIs 2: Text

1 code implementation4 Mar 2024 Chulin Xie, Zinan Lin, Arturs Backurs, Sivakanth Gopi, Da Yu, Huseyin A Inan, Harsha Nori, Haotian Jiang, Huishuai Zhang, Yin Tat Lee, Bo Li, Sergey Yekhanin

Lin et al. (2024) recently introduced the Private Evolution (PE) algorithm to generate DP synthetic images with only API access to diffusion models.

Privacy Preserving

Exploring Transferability for Randomized Smoothing

no code implementations14 Dec 2023 Kai Qiu, Huishuai Zhang, Zhirong Wu, Stephen Lin

However, the model robustness, which is a critical aspect for safety, is often optimized for each specific task rather than at the pretraining stage.

Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study

no code implementations25 Nov 2023 Prin Phunyaphibarn, Junghyun Lee, Bohan Wang, Huishuai Zhang, Chulhee Yun

Although gradient descent with momentum is widely used in modern deep learning, a concrete understanding of its effects on the training trajectory still remains elusive.

On the Generalization Properties of Diffusion Models

1 code implementation NeurIPS 2023 Puheng Li, Zhong Li, Huishuai Zhang, Jiang Bian

This precisely elucidates the adverse effect of "modes shift" in ground truths on the model generalization.

Closing the Gap Between the Upper Bound and the Lower Bound of Adam's Iteration Complexity

no code implementations27 Oct 2023 Bohan Wang, Jingwen Fu, Huishuai Zhang, Nanning Zheng, Wei Chen

Recently, Arjevani et al. [1] established a lower bound of iteration complexity for the first-order optimization under an $L$-smooth condition and a bounded noise variance assumption.

LEMMA valid

When and Why Momentum Accelerates SGD:An Empirical Study

no code implementations15 Jun 2023 Jingwen Fu, Bohan Wang, Huishuai Zhang, Zhizheng Zhang, Wei Chen, Nanning Zheng

In the comparison of SGDM and SGD with the same effective learning rate and the same batch size, we observe a consistent pattern: when $\eta_{ef}$ is small, SGDM and SGD experience almost the same empirical training losses; when $\eta_{ef}$ surpasses a certain threshold, SGDM begins to perform better than SGD.

UADB: Unsupervised Anomaly Detection Booster

1 code implementation3 Jun 2023 Hangting Ye, Zhining Liu, Xinyi Shen, Wei Cao, Shun Zheng, Xiaofan Gui, Huishuai Zhang, Yi Chang, Jiang Bian

This is a challenging task given the heterogeneous model structures and assumptions adopted by existing UAD methods.

Unsupervised Anomaly Detection

Convergence of AdaGrad for Non-convex Objectives: Simple Proofs and Relaxed Assumptions

no code implementations29 May 2023 Bohan Wang, Huishuai Zhang, Zhi-Ming Ma, Wei Chen

We provide a simple convergence proof for AdaGrad optimizing non-convex objectives under only affine noise variance and bounded smoothness assumptions.

Selective Pre-training for Private Fine-tuning

1 code implementation23 May 2023 Da Yu, Sivakanth Gopi, Janardhan Kulkarni, Zinan Lin, Saurabh Naik, Tomasz Lukasz Religa, Jian Yin, Huishuai Zhang

Besides performance improvements, our framework also shows that with careful pre-training and private fine-tuning, smaller models can match the performance of much larger models that do not have access to private data, highlighting the promise of private learning as a tool for model compression and efficiency.

Model Compression Transfer Learning

ResiDual: Transformer with Dual Residual Connections

1 code implementation28 Apr 2023 Shufang Xie, Huishuai Zhang, Junliang Guo, Xu Tan, Jiang Bian, Hany Hassan Awadalla, Arul Menezes, Tao Qin, Rui Yan

In this paper, we propose ResiDual, a novel Transformer architecture with Pre-Post-LN (PPLN), which fuses the connections in Post-LN and Pre-LN together and inherits their advantages while avoids their limitations.

Machine Translation

Exploring the Limits of Differentially Private Deep Learning with Group-wise Clipping

no code implementations3 Dec 2022 Jiyan He, Xuechen Li, Da Yu, Huishuai Zhang, Janardhan Kulkarni, Yin Tat Lee, Arturs Backurs, Nenghai Yu, Jiang Bian

To reduce the compute time overhead of private learning, we show that \emph{per-layer clipping}, where the gradient of each neural network layer is clipped separately, allows clipping to be performed in conjunction with backpropagation in differentially private optimization.

Computational Efficiency

Denoising Masked AutoEncoders Help Robust Classification

1 code implementation10 Oct 2022 Quanlin Wu, Hang Ye, Yuntian Gu, Huishuai Zhang, LiWei Wang, Di He

In this paper, we propose a new self-supervised method, which is called Denoising Masked AutoEncoders (DMAE), for learning certified robust classifiers of images.

Classification Denoising +1

Provable Adaptivity in Adam

no code implementations21 Aug 2022 Bohan Wang, Yushun Zhang, Huishuai Zhang, Qi Meng, Zhi-Ming Ma, Tie-Yan Liu, Wei Chen

In particular, the existing analysis of Adam cannot clearly demonstrate the advantage of Adam over SGD.

Attribute

Normalized/Clipped SGD with Perturbation for Differentially Private Non-Convex Optimization

no code implementations27 Jun 2022 Xiaodong Yang, Huishuai Zhang, Wei Chen, Tie-Yan Liu

By ensuring differential privacy in the learning algorithms, one can rigorously mitigate the risk of large models memorizing sensitive training data.

Adversarial Noises Are Linearly Separable for (Nearly) Random Neural Networks

no code implementations9 Jun 2022 Huishuai Zhang, Da Yu, Yiping Lu, Di He

Adversarial examples, which are usually generated for specific inputs with a specific model, are ubiquitous for neural networks.

Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent

1 code implementation6 Jun 2022 Da Yu, Gautam Kamath, Janardhan Kulkarni, Tie-Yan Liu, Jian Yin, Huishuai Zhang

Differentially private stochastic gradient descent (DP-SGD) is the workhorse algorithm for recent advances in private deep learning.

Robust Quantity-Aware Aggregation for Federated Learning

no code implementations22 May 2022 Jingwei Yi, Fangzhao Wu, Huishuai Zhang, Bin Zhu, Tao Qi, Guangzhong Sun, Xing Xie

Federated learning (FL) enables multiple clients to collaboratively train models without sharing their local data, and becomes an important privacy-preserving machine learning framework.

Federated Learning Privacy Preserving

Availability Attacks Create Shortcuts

1 code implementation1 Nov 2021 Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu

We are the first to unveil an important population property of the perturbations of these attacks: they are almost \textbf{linearly separable} when assigned with the target labels of the corresponding samples, which hence can work as \emph{shortcuts} for the learning objective.

Data Poisoning

Optimizing Information-theoretical Generalization Bounds via Anisotropic Noise in SGLD

no code implementations NeurIPS 2021 Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu

We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized.

Generalization Bounds

Differentially Private Fine-tuning of Language Models

2 code implementations ICLR 2022 Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A. Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, Huishuai Zhang

For example, on the MNLI dataset we achieve an accuracy of $87. 8\%$ using RoBERTa-Large and $83. 5\%$ using RoBERTa-Base with a privacy budget of $\epsilon = 6. 7$.

Text Generation

Does Momentum Change the Implicit Regularization on Separable Data?

no code implementations8 Oct 2021 Bohan Wang, Qi Meng, Huishuai Zhang, Ruoyu Sun, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

The momentum acceleration technique is widely adopted in many optimization algorithms.

Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum

no code implementations29 Sep 2021 Zeke Xie, Xinrui Wang, Huishuai Zhang, Issei Sato, Masashi Sugiyama

Specifically, we disentangle the effects of Adaptive Learning Rate and Momentum of the Adam dynamics on saddle-point escaping and flat minima selection.

Regularized-OFU: an efficient algorithm for general contextual bandit with optimization oracles

no code implementations29 Sep 2021 Yichi Zhou, Shihong Song, Huishuai Zhang, Jun Zhu, Wei Chen, Tie-Yan Liu

In contextual bandit, one major challenge is to develop theoretically solid and empirically efficient algorithms for general function classes.

Multi-Armed Bandits Thompson Sampling

Regularized OFU: an Efficient UCB Estimator forNon-linear Contextual Bandit

no code implementations29 Jun 2021 Yichi Zhou, Shihong Song, Huishuai Zhang, Jun Zhu, Wei Chen, Tie-Yan Liu

However, it is in general unknown how to deriveefficient and effective EE trade-off methods for non-linearcomplex tasks, suchas contextual bandit with deep neural network as the reward function.

Multi-Armed Bandits

Large Scale Private Learning via Low-rank Reparametrization

1 code implementation17 Jun 2021 Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu

We propose a reparametrization scheme to address the challenges of applying differentially private SGD on large neural networks, which are 1) the huge memory cost of storing individual gradients, 2) the added noise suffering notorious dimensional dependence.

Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart

1 code implementation CVPR 2022 Tianyu Pang, Huishuai Zhang, Di He, Yinpeng Dong, Hang Su, Wei Chen, Jun Zhu, Tie-Yan Liu

Along with this routine, we find that confidence and a rectified confidence (R-Con) can form two coupled rejection metrics, which could provably distinguish wrongly classified inputs from correctly classified ones.

Vocal Bursts Valence Prediction

Optimizing Information-theoretical Generalization Bound via Anisotropic Noise of SGLD

no code implementations NeurIPS 2021 Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu

We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized.

Generalization Bounds

Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning

2 code implementations ICLR 2021 Da Yu, Huishuai Zhang, Wei Chen, Tie-Yan Liu

The privacy leakage of the model about the training data can be bounded in the differential privacy mechanism.

BN-invariant sharpness regularizes the training model to better generalization

no code implementations8 Jan 2021 Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

However, it has been pointed out that the usual definitions of sharpness, which consider either the maxima or the integral of loss over a $\delta$ ball of parameters around minima, cannot give consistent measurement for scale invariant neural networks, e. g., networks with batch normalization layer.

On the Stability of Multi-branch Network

no code implementations1 Jan 2021 Huishuai Zhang, Da Yu, Wei Chen, Tie-Yan Liu

More importantly, we propose a new design ``STAM aggregation" that can guarantee to STAbilize the forward/backward process of Multi-branch networks irrespective of the number of branches.

How Does Data Augmentation Affect Privacy in Machine Learning?

1 code implementation21 Jul 2020 Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu

Even further, we show that the proposed approach can achieve higher MI attack success rates on models trained with some data augmentation than the existing methods on models trained without data augmentation.

BIG-bench Machine Learning Data Augmentation

Adai: Separating the Effects of Adaptive Learning Rate and Momentum Inertia

1 code implementation29 Jun 2020 Zeke Xie, Xinrui Wang, Huishuai Zhang, Issei Sato, Masashi Sugiyama

Specifically, we disentangle the effects of Adaptive Learning Rate and Momentum of the Adam dynamics on saddle-point escaping and minima selection.

Gradient Perturbation is Underrated for Differentially Private Convex Optimization

no code implementations26 Nov 2019 Da Yu, Huishuai Zhang, Wei Chen, Tie-Yan Liu, Jian Yin

By using the \emph{expected curvature}, we show that gradient perturbation can achieve a significantly improved utility guarantee that can theoretically justify the advantage of gradient perturbation over other perturbation methods.

STABILITY AND CONVERGENCE THEORY FOR LEARNING RESNET: A FULL CHARACTERIZATION

no code implementations25 Sep 2019 Huishuai Zhang, Da Yu, Mingyang Yi, Wei Chen, Tie-Yan Liu

We show that for standard initialization used in practice, $\tau =1/\Omega(\sqrt{L})$ is a sharp value in characterizing the stability of forward/backward process of ResNet, where $L$ is the number of residual blocks.

THE EFFECT OF ADVERSARIAL TRAINING: A THEORETICAL CHARACTERIZATION

no code implementations25 Sep 2019 Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

It has widely shown that adversarial training (Madry et al., 2018) is effective in defending adversarial attack empirically.

Adversarial Attack

Convergence of Distributed Stochastic Variance Reduced Methods without Sampling Extra Data

no code implementations29 May 2019 Shicong Cen, Huishuai Zhang, Yuejie Chi, Wei Chen, Tie-Yan Liu

Our theory captures how the convergence of distributed algorithms behaves as the number of machines and the size of local data vary.

Optimization on Multiple Manifolds

no code implementations ICLR 2019 Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Optimization on manifold has been widely used in machine learning, to handle optimization problems with constraint.

G-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

no code implementations ICLR 2019 Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Then, a natural question is: \emph{can we construct a new vector space that is positively scale-invariant and sufficient to represent ReLU neural networks so as to better facilitate the optimization process }?

Stabilize Deep ResNet with A Sharp Scaling Factor $τ$

1 code implementation17 Mar 2019 Huishuai Zhang, Da Yu, Mingyang Yi, Wei Chen, Tie-Yan Liu

Moreover, for ResNets with normalization layer, adding such a factor $\tau$ also stabilizes the training and obtains significant performance gain for deep ResNet.

SGD Converges to Global Minimum in Deep Learning via Star-convex Path

no code implementations ICLR 2019 Yi Zhou, Junjie Yang, Huishuai Zhang, Yingbin Liang, Vahid Tarokh

Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a variety of deep neural networks.

On the Local Hessian in Back-propagation

no code implementations NeurIPS 2018 Huishuai Zhang, Wei Chen, Tie-Yan Liu

We study the Hessian of the local back-matching loss (local Hessian) and connect it to the efficiency of BP.

Capacity Control of ReLU Neural Networks by Basis-path Norm

no code implementations19 Sep 2018 Shuxin Zheng, Qi Meng, Huishuai Zhang, Wei Chen, Nenghai Yu, Tie-Yan Liu

Motivated by this, we propose a new norm \emph{Basis-path Norm} based on a group of linearly independent paths to measure the capacity of neural networks more accurately.

Train Feedfoward Neural Network with Layer-wise Adaptive Rate via Approximating Back-matching Propagation

no code implementations27 Feb 2018 Huishuai Zhang, Wei Chen, Tie-Yan Liu

This inconsistence of gradient magnitude across different layers renders optimization of deep neural network with a single learning rate problematic.

Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization

no code implementations19 Feb 2018 Yi Zhou, Yingbin Liang, Huishuai Zhang

With strongly convex regularizers, we further establish the generalization error bounds for nonconvex loss functions under proximal SGD with high-probability guarantee, i. e., exponential concentration in probability.

$\mathcal{G}$-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

no code implementations11 Feb 2018 Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Then, a natural question is: \emph{can we construct a new vector space that is positively scale-invariant and sufficient to represent ReLU neural networks so as to better facilitate the optimization process }?

Block-diagonal Hessian-free Optimization for Training Neural Networks

no code implementations ICLR 2018 Huishuai Zhang, Caiming Xiong, James Bradbury, Richard Socher

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence.

Second-order methods

Nonconvex Low-Rank Matrix Recovery with Arbitrary Outliers via Median-Truncated Gradient Descent

no code implementations23 Sep 2017 Yuanxin Li, Yuejie Chi, Huishuai Zhang, Yingbin Liang

Recent work has demonstrated the effectiveness of gradient descent for directly recovering the factors of low-rank matrices from random linear measurements in a globally convergent manner when initialized properly.

Reshaped Wirtinger Flow for Solving Quadratic System of Equations

no code implementations NeurIPS 2016 Huishuai Zhang, Yingbin Liang

In contrast to the smooth loss function used in WF, we adopt a nonsmooth but lower-order loss function, and design a gradient-like algorithm (referred to as reshaped-WF).

Reshaped Wirtinger Flow and Incremental Algorithm for Solving Quadratic System of Equations

1 code implementation25 May 2016 Huishuai Zhang, Yi Zhou, Yingbin Liang, Yuejie Chi

We further develop the incremental (stochastic) reshaped Wirtinger flow (IRWF) and show that IRWF converges linearly to the true signal.

Retrieval

Median-Truncated Nonconvex Approach for Phase Retrieval with Outliers

no code implementations11 Mar 2016 Huishuai Zhang, Yuejie Chi, Yingbin Liang

This paper investigates the phase retrieval problem, which aims to recover a signal from the magnitudes of its linear measurements.

Retrieval

Analysis of Robust PCA via Local Incoherence

no code implementations NeurIPS 2015 Huishuai Zhang, Yi Zhou, Yingbin Liang

We investigate the robust PCA problem of decomposing an observed matrix into the sum of a low-rank and a sparse error matrices via convex programming Principal Component Pursuit (PCP).

Cannot find the paper you are looking for? You can Submit a new open access paper.