Search Results for author: Huishuai Zhang

Found 64 papers, 22 papers with code

AIDBench: A benchmark for evaluating the authorship identification capability of large language models

no code implementations20 Nov 2024 Zichen Wen, Dadi Guo, Huishuai Zhang

As large language models (LLMs) rapidly advance and integrate into daily life, the privacy risks they pose are attracting increasing attention.

RAG

Understanding Multimodal Hallucination with Parameter-Free Representation Alignment

1 code implementation2 Sep 2024 Yueqian Wang, Jianxin Liang, Yuxuan Wang, Huishuai Zhang, Dongyan Zhao

To analyze image representations while completely avoiding the influence of all other factors other than the image representation itself, we propose a parametric-free representation alignment metric (Pfram) that can measure the similarities between any two representation systems without requiring additional training parameters.

Hallucination Object +1

ReMamba: Equip Mamba with Effective Long-Sequence Modeling

1 code implementation28 Aug 2024 Danlong Yuan, Jiahao Liu, Bei Li, Huishuai Zhang, Jingang Wang, Xunliang Cai, Dongyan Zhao

While the Mamba architecture demonstrates superior inference efficiency and competitive performance on short-context natural language processing (NLP) tasks, empirical evidence suggests its capacity to comprehend long contexts is limited compared to transformer-based models.

Mamba

Evidence-Enhanced Triplet Generation Framework for Hallucination Alleviation in Generative Question Answering

no code implementations27 Aug 2024 Haowei Du, Huishuai Zhang, Dongyan Zhao

To address the hallucination in generative question answering (GQA) where the answer can not be derived from the document, we propose a novel evidence-enhanced triplet generation framework, EATQA, encouraging the model to predict all the combinations of (Question, Evidence, Answer) triplet by flipping the source pair and the target label to understand their logical relationships, i. e., predict Answer(A), Question(Q), and Evidence(E) given a QE, EA, and QA pairs, respectively.

Generative Question Answering Hallucination +1

Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules

no code implementations9 Jul 2024 Zhuocheng Gong, Ang Lv, Jian Guan, Junxi Yan, Wei Wu, Huishuai Zhang, Minlie Huang, Dongyan Zhao, Rui Yan

More interestingly, with a fixed parameter budget, MoM-large enables an over 38% increase in depth for computation graphs compared to GPT-2-large, resulting in absolute gains of 1. 4 on GLUE and 1 on XSUM.

Efficient Continual Pre-training by Mitigating the Stability Gap

no code implementations21 Jun 2024 Yiduo Guo, Jie Fu, Huishuai Zhang, Dongyan Zhao, Yikang Shen

This process involves updating the pre-trained LLM with a corpus from a new domain, resulting in a shift in the training distribution.

Automatic Jailbreaking of the Text-to-Image Generative AI Systems

1 code implementation26 May 2024 Minseon Kim, Hyomin Lee, Boqing Gong, Huishuai Zhang, Sung Ju Hwang

Recent AI systems have shown extremely powerful performance, even surpassing human performance, on various tasks such as information retrieval, language generation, and image generation based on large language models (LLMs).

Image Generation Information Retrieval +2

©Plug-in Authorization for Human Content Copyright Protection in Text-to-Image Model

no code implementations18 Apr 2024 Chao Zhou, Huishuai Zhang, Jiang Bian, Weiming Zhang, Nenghai Yu

To mitigate this, we propose the \copyright Plug-in Authorization framework, introducing three operations: addition, extraction, and combination.

On the Convergence of Adam under Non-uniform Smoothness: Separability from SGDM and Beyond

no code implementations22 Mar 2024 Bohan Wang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Wei Chen

This paper aims to clearly distinguish between Stochastic Gradient Descent with Momentum (SGDM) and Adam in terms of their convergence rates.

Differentially Private Synthetic Data via Foundation Model APIs 2: Text

2 code implementations4 Mar 2024 Chulin Xie, Zinan Lin, Arturs Backurs, Sivakanth Gopi, Da Yu, Huseyin A Inan, Harsha Nori, Haotian Jiang, Huishuai Zhang, Yin Tat Lee, Bo Li, Sergey Yekhanin

Lin et al. (2024) recently introduced the Private Evolution (PE) algorithm to generate DP synthetic images with only API access to diffusion models.

Privacy Preserving

Exploring Transferability for Randomized Smoothing

no code implementations14 Dec 2023 Kai Qiu, Huishuai Zhang, Zhirong Wu, Stephen Lin

However, the model robustness, which is a critical aspect for safety, is often optimized for each specific task rather than at the pretraining stage.

Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults

no code implementations25 Nov 2023 Prin Phunyaphibarn, Junghyun Lee, Bohan Wang, Huishuai Zhang, Chulhee Yun

Although gradient descent with Polyak's momentum is widely used in modern machine and deep learning, a concrete understanding of its effects on the training trajectory remains elusive.

On the Generalization Properties of Diffusion Models

1 code implementation NeurIPS 2023 Puheng Li, Zhong Li, Huishuai Zhang, Jiang Bian

This precisely elucidates the adverse effect of "modes shift" in ground truths on the model generalization.

Closing the Gap Between the Upper Bound and the Lower Bound of Adam's Iteration Complexity

no code implementations27 Oct 2023 Bohan Wang, Jingwen Fu, Huishuai Zhang, Nanning Zheng, Wei Chen

Recently, Arjevani et al. [1] established a lower bound of iteration complexity for the first-order optimization under an $L$-smooth condition and a bounded noise variance assumption.

LEMMA valid

When and Why Momentum Accelerates SGD:An Empirical Study

no code implementations15 Jun 2023 Jingwen Fu, Bohan Wang, Huishuai Zhang, Zhizheng Zhang, Wei Chen, Nanning Zheng

In the comparison of SGDM and SGD with the same effective learning rate and the same batch size, we observe a consistent pattern: when $\eta_{ef}$ is small, SGDM and SGD experience almost the same empirical training losses; when $\eta_{ef}$ surpasses a certain threshold, SGDM begins to perform better than SGD.

UADB: Unsupervised Anomaly Detection Booster

1 code implementation3 Jun 2023 Hangting Ye, Zhining Liu, Xinyi Shen, Wei Cao, Shun Zheng, Xiaofan Gui, Huishuai Zhang, Yi Chang, Jiang Bian

This is a challenging task given the heterogeneous model structures and assumptions adopted by existing UAD methods.

Unsupervised Anomaly Detection

Convergence of AdaGrad for Non-convex Objectives: Simple Proofs and Relaxed Assumptions

no code implementations29 May 2023 Bohan Wang, Huishuai Zhang, Zhi-Ming Ma, Wei Chen

We provide a simple convergence proof for AdaGrad optimizing non-convex objectives under only affine noise variance and bounded smoothness assumptions.

Selective Pre-training for Private Fine-tuning

1 code implementation23 May 2023 Da Yu, Sivakanth Gopi, Janardhan Kulkarni, Zinan Lin, Saurabh Naik, Tomasz Lukasz Religa, Jian Yin, Huishuai Zhang

In this work, we show that a careful pre-training on a \emph{subset} of the public dataset that is guided by the private dataset is crucial to train small language models with differential privacy.

Model Compression Transfer Learning

ResiDual: Transformer with Dual Residual Connections

1 code implementation28 Apr 2023 Shufang Xie, Huishuai Zhang, Junliang Guo, Xu Tan, Jiang Bian, Hany Hassan Awadalla, Arul Menezes, Tao Qin, Rui Yan

In this paper, we propose ResiDual, a novel Transformer architecture with Pre-Post-LN (PPLN), which fuses the connections in Post-LN and Pre-LN together and inherits their advantages while avoids their limitations.

Machine Translation

Exploring the Limits of Differentially Private Deep Learning with Group-wise Clipping

no code implementations3 Dec 2022 Jiyan He, Xuechen Li, Da Yu, Huishuai Zhang, Janardhan Kulkarni, Yin Tat Lee, Arturs Backurs, Nenghai Yu, Jiang Bian

To reduce the compute time overhead of private learning, we show that \emph{per-layer clipping}, where the gradient of each neural network layer is clipped separately, allows clipping to be performed in conjunction with backpropagation in differentially private optimization.

Computational Efficiency

Denoising Masked AutoEncoders Help Robust Classification

1 code implementation10 Oct 2022 Quanlin Wu, Hang Ye, Yuntian Gu, Huishuai Zhang, LiWei Wang, Di He

In this paper, we propose a new self-supervised method, which is called Denoising Masked AutoEncoders (DMAE), for learning certified robust classifiers of images.

Classification Decoder +2

Provable Adaptivity of Adam under Non-uniform Smoothness

no code implementations21 Aug 2022 Bohan Wang, Yushun Zhang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Tie-Yan Liu, Zhi-Quan Luo, Wei Chen

We present the first convergence analysis of RR Adam without the bounded smoothness assumption.

Attribute

Normalized/Clipped SGD with Perturbation for Differentially Private Non-Convex Optimization

no code implementations27 Jun 2022 Xiaodong Yang, Huishuai Zhang, Wei Chen, Tie-Yan Liu

By ensuring differential privacy in the learning algorithms, one can rigorously mitigate the risk of large models memorizing sensitive training data.

Adversarial Noises Are Linearly Separable for (Nearly) Random Neural Networks

no code implementations9 Jun 2022 Huishuai Zhang, Da Yu, Yiping Lu, Di He

Adversarial examples, which are usually generated for specific inputs with a specific model, are ubiquitous for neural networks.

Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent

1 code implementation6 Jun 2022 Da Yu, Gautam Kamath, Janardhan Kulkarni, Tie-Yan Liu, Jian Yin, Huishuai Zhang

Differentially private stochastic gradient descent (DP-SGD) is the workhorse algorithm for recent advances in private deep learning.

Robust Quantity-Aware Aggregation for Federated Learning

no code implementations22 May 2022 Jingwei Yi, Fangzhao Wu, Huishuai Zhang, Bin Zhu, Tao Qi, Guangzhong Sun, Xing Xie

Federated learning (FL) enables multiple clients to collaboratively train models without sharing their local data, and becomes an important privacy-preserving machine learning framework.

Federated Learning Privacy Preserving

Availability Attacks Create Shortcuts

1 code implementation1 Nov 2021 Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu

We are the first to unveil an important population property of the perturbations of these attacks: they are almost \textbf{linearly separable} when assigned with the target labels of the corresponding samples, which hence can work as \emph{shortcuts} for the learning objective.

Data Poisoning

Optimizing Information-theoretical Generalization Bounds via Anisotropic Noise in SGLD

no code implementations NeurIPS 2021 Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu

We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized.

Generalization Bounds

Differentially Private Fine-tuning of Language Models

2 code implementations ICLR 2022 Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A. Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, Huishuai Zhang

For example, on the MNLI dataset we achieve an accuracy of $87. 8\%$ using RoBERTa-Large and $83. 5\%$ using RoBERTa-Base with a privacy budget of $\epsilon = 6. 7$.

Text Generation

Does Momentum Change the Implicit Regularization on Separable Data?

no code implementations8 Oct 2021 Bohan Wang, Qi Meng, Huishuai Zhang, Ruoyu Sun, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

The momentum acceleration technique is widely adopted in many optimization algorithms.

Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum

no code implementations29 Sep 2021 Zeke Xie, Xinrui Wang, Huishuai Zhang, Issei Sato, Masashi Sugiyama

Specifically, we disentangle the effects of Adaptive Learning Rate and Momentum of the Adam dynamics on saddle-point escaping and flat minima selection.

Regularized-OFU: an efficient algorithm for general contextual bandit with optimization oracles

no code implementations29 Sep 2021 Yichi Zhou, Shihong Song, Huishuai Zhang, Jun Zhu, Wei Chen, Tie-Yan Liu

In contextual bandit, one major challenge is to develop theoretically solid and empirically efficient algorithms for general function classes.

Thompson Sampling

Regularized OFU: an Efficient UCB Estimator forNon-linear Contextual Bandit

no code implementations29 Jun 2021 Yichi Zhou, Shihong Song, Huishuai Zhang, Jun Zhu, Wei Chen, Tie-Yan Liu

However, it is in general unknown how to deriveefficient and effective EE trade-off methods for non-linearcomplex tasks, suchas contextual bandit with deep neural network as the reward function.

Multi-Armed Bandits

Large Scale Private Learning via Low-rank Reparametrization

1 code implementation17 Jun 2021 Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu

We propose a reparametrization scheme to address the challenges of applying differentially private SGD on large neural networks, which are 1) the huge memory cost of storing individual gradients, 2) the added noise suffering notorious dimensional dependence.

Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart

1 code implementation CVPR 2022 Tianyu Pang, Huishuai Zhang, Di He, Yinpeng Dong, Hang Su, Wei Chen, Jun Zhu, Tie-Yan Liu

Along with this routine, we find that confidence and a rectified confidence (R-Con) can form two coupled rejection metrics, which could provably distinguish wrongly classified inputs from correctly classified ones.

Vocal Bursts Valence Prediction

Optimizing Information-theoretical Generalization Bound via Anisotropic Noise of SGLD

no code implementations NeurIPS 2021 Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu

We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized.

Generalization Bounds

Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning

2 code implementations ICLR 2021 Da Yu, Huishuai Zhang, Wei Chen, Tie-Yan Liu

The privacy leakage of the model about the training data can be bounded in the differential privacy mechanism.

BN-invariant sharpness regularizes the training model to better generalization

no code implementations8 Jan 2021 Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

However, it has been pointed out that the usual definitions of sharpness, which consider either the maxima or the integral of loss over a $\delta$ ball of parameters around minima, cannot give consistent measurement for scale invariant neural networks, e. g., networks with batch normalization layer.

On the Stability of Multi-branch Network

no code implementations1 Jan 2021 Huishuai Zhang, Da Yu, Wei Chen, Tie-Yan Liu

More importantly, we propose a new design ``STAM aggregation" that can guarantee to STAbilize the forward/backward process of Multi-branch networks irrespective of the number of branches.

How Does Data Augmentation Affect Privacy in Machine Learning?

1 code implementation21 Jul 2020 Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, Tie-Yan Liu

Even further, we show that the proposed approach can achieve higher MI attack success rates on models trained with some data augmentation than the existing methods on models trained without data augmentation.

BIG-bench Machine Learning Data Augmentation

Adai: Separating the Effects of Adaptive Learning Rate and Momentum Inertia

1 code implementation29 Jun 2020 Zeke Xie, Xinrui Wang, Huishuai Zhang, Issei Sato, Masashi Sugiyama

Specifically, we disentangle the effects of Adaptive Learning Rate and Momentum of the Adam dynamics on saddle-point escaping and minima selection.

Gradient Perturbation is Underrated for Differentially Private Convex Optimization

no code implementations26 Nov 2019 Da Yu, Huishuai Zhang, Wei Chen, Tie-Yan Liu, Jian Yin

By using the \emph{expected curvature}, we show that gradient perturbation can achieve a significantly improved utility guarantee that can theoretically justify the advantage of gradient perturbation over other perturbation methods.

THE EFFECT OF ADVERSARIAL TRAINING: A THEORETICAL CHARACTERIZATION

no code implementations25 Sep 2019 Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

It has widely shown that adversarial training (Madry et al., 2018) is effective in defending adversarial attack empirically.

Adversarial Attack

STABILITY AND CONVERGENCE THEORY FOR LEARNING RESNET: A FULL CHARACTERIZATION

no code implementations25 Sep 2019 Huishuai Zhang, Da Yu, Mingyang Yi, Wei Chen, Tie-Yan Liu

We show that for standard initialization used in practice, $\tau =1/\Omega(\sqrt{L})$ is a sharp value in characterizing the stability of forward/backward process of ResNet, where $L$ is the number of residual blocks.

Convergence of Distributed Stochastic Variance Reduced Methods without Sampling Extra Data

no code implementations29 May 2019 Shicong Cen, Huishuai Zhang, Yuejie Chi, Wei Chen, Tie-Yan Liu

Our theory captures how the convergence of distributed algorithms behaves as the number of machines and the size of local data vary.

G-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

no code implementations ICLR 2019 Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Then, a natural question is: \emph{can we construct a new vector space that is positively scale-invariant and sufficient to represent ReLU neural networks so as to better facilitate the optimization process }?

Optimization on Multiple Manifolds

no code implementations ICLR 2019 Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Optimization on manifold has been widely used in machine learning, to handle optimization problems with constraint.

Stabilize Deep ResNet with A Sharp Scaling Factor $τ$

1 code implementation17 Mar 2019 Huishuai Zhang, Da Yu, Mingyang Yi, Wei Chen, Tie-Yan Liu

Moreover, for ResNets with normalization layer, adding such a factor $\tau$ also stabilizes the training and obtains significant performance gain for deep ResNet.

SGD Converges to Global Minimum in Deep Learning via Star-convex Path

no code implementations ICLR 2019 Yi Zhou, Junjie Yang, Huishuai Zhang, Yingbin Liang, Vahid Tarokh

Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a variety of deep neural networks.

On the Local Hessian in Back-propagation

no code implementations NeurIPS 2018 Huishuai Zhang, Wei Chen, Tie-Yan Liu

We study the Hessian of the local back-matching loss (local Hessian) and connect it to the efficiency of BP.

Capacity Control of ReLU Neural Networks by Basis-path Norm

no code implementations19 Sep 2018 Shuxin Zheng, Qi Meng, Huishuai Zhang, Wei Chen, Nenghai Yu, Tie-Yan Liu

Motivated by this, we propose a new norm \emph{Basis-path Norm} based on a group of linearly independent paths to measure the capacity of neural networks more accurately.

Train Feedfoward Neural Network with Layer-wise Adaptive Rate via Approximating Back-matching Propagation

no code implementations27 Feb 2018 Huishuai Zhang, Wei Chen, Tie-Yan Liu

This inconsistence of gradient magnitude across different layers renders optimization of deep neural network with a single learning rate problematic.

Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization

no code implementations19 Feb 2018 Yi Zhou, Yingbin Liang, Huishuai Zhang

With strongly convex regularizers, we further establish the generalization error bounds for nonconvex loss functions under proximal SGD with high-probability guarantee, i. e., exponential concentration in probability.

$\mathcal{G}$-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

no code implementations11 Feb 2018 Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Then, a natural question is: \emph{can we construct a new vector space that is positively scale-invariant and sufficient to represent ReLU neural networks so as to better facilitate the optimization process }?

Block-diagonal Hessian-free Optimization for Training Neural Networks

no code implementations ICLR 2018 Huishuai Zhang, Caiming Xiong, James Bradbury, Richard Socher

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence.

Second-order methods

Nonconvex Low-Rank Matrix Recovery with Arbitrary Outliers via Median-Truncated Gradient Descent

no code implementations23 Sep 2017 Yuanxin Li, Yuejie Chi, Huishuai Zhang, Yingbin Liang

Recent work has demonstrated the effectiveness of gradient descent for directly recovering the factors of low-rank matrices from random linear measurements in a globally convergent manner when initialized properly.

Reshaped Wirtinger Flow for Solving Quadratic System of Equations

no code implementations NeurIPS 2016 Huishuai Zhang, Yingbin Liang

In contrast to the smooth loss function used in WF, we adopt a nonsmooth but lower-order loss function, and design a gradient-like algorithm (referred to as reshaped-WF).

Reshaped Wirtinger Flow and Incremental Algorithm for Solving Quadratic System of Equations

1 code implementation25 May 2016 Huishuai Zhang, Yi Zhou, Yingbin Liang, Yuejie Chi

We further develop the incremental (stochastic) reshaped Wirtinger flow (IRWF) and show that IRWF converges linearly to the true signal.

Retrieval

Median-Truncated Nonconvex Approach for Phase Retrieval with Outliers

no code implementations11 Mar 2016 Huishuai Zhang, Yuejie Chi, Yingbin Liang

This paper investigates the phase retrieval problem, which aims to recover a signal from the magnitudes of its linear measurements.

Retrieval

Analysis of Robust PCA via Local Incoherence

no code implementations NeurIPS 2015 Huishuai Zhang, Yi Zhou, Yingbin Liang

We investigate the robust PCA problem of decomposing an observed matrix into the sum of a low-rank and a sparse error matrices via convex programming Principal Component Pursuit (PCP).

Cannot find the paper you are looking for? You can Submit a new open access paper.