Search Results for author: Guang Cheng

Found 79 papers, 15 papers with code

Mutual Transfer Learning for Massive Data

no code implementations ICML 2020 Ching-Wei Cheng, Xingye Qiao, Guang Cheng

In this article, we study a new paradigm called mutual transfer learning where among many heterogeneous data domains, every data domain could potentially be the target of interest, and it could also be a useful source to help the learning in other data domains.

Transfer Learning

Minimax Optimal Fair Classification with Bounded Demographic Disparity

1 code implementation27 Mar 2024 Xianli Zeng, Guang Cheng, Edgar Dobriban

Mitigating the disparate impact of statistical machine learning methods is crucial for ensuring fairness.

Binary Classification Classification +1

Approximation of RKHS Functionals by Neural Networks

no code implementations18 Mar 2024 Tian-Yi Zhou, Namjoon Suh, Guang Cheng, Xiaoming Huo

Motivated by the abundance of functional data such as time series and images, there has been a growing interest in integrating such data into neural networks and learning maps from function spaces to R (i. e., functionals).

regression Time Series

FairRR: Pre-Processing for Group Fairness through Randomized Response

1 code implementation12 Mar 2024 Xianli Zeng, Joshua Ward, Guang Cheng

The increasing usage of machine learning models in consequential decision-making processes has spurred research into the fairness of these systems.

Decision Making Fairness

Rate-Optimal Rank Aggregation with Private Pairwise Rankings

no code implementations26 Feb 2024 SHIRONG XU, Will Wei Sun, Guang Cheng

Motivated from this, we propose a debiased randomized response mechanism to protect the raw pairwise rankings, ensuring consistent estimation of true preferences and rankings in downstream rank aggregation.

Recommendation Systems

Bayes-Optimal Fair Classification with Linear Disparity Constraints via Pre-, In-, and Post-processing

1 code implementation5 Feb 2024 Xianli Zeng, Guang Cheng, Edgar Dobriban

To address this, we develop methods for Bayes-optimal fair classification, aiming to minimize classification error subject to given group fairness constraints.

Attribute Classification +2

Benefits of Transformer: In-Context Learning in Linear Regression Tasks with Unstructured Data

no code implementations1 Feb 2024 Yue Xing, Xiaofeng Lin, Namjoon Suh, Qifan Song, Guang Cheng

In practice, it is observed that transformer-based models can learn concepts in context in the inference stage.

In-Context Learning

Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective

no code implementations26 Jan 2024 Yue Xing, Xiaofeng Lin, Qifan Song, Yi Xu, Belinda Zeng, Guang Cheng

Pre-training is known to generate universal representations for downstream tasks in large-scale deep learning such as large language models.

Adversarial Robustness Contrastive Learning +1

Improve Fidelity and Utility of Synthetic Credit Card Transaction Time Series from Data-centric Perspective

no code implementations1 Jan 2024 Din-Yin Hsieh, Chi-Hua Wang, Guang Cheng

Exploring generative model training for synthetic tabular data, specifically in sequential contexts such as credit card transaction data, presents significant challenges.

Fraud Detection Time Series

MalPurifier: Enhancing Android Malware Detection with Adversarial Purification against Evasion Attacks

1 code implementation11 Dec 2023 Yuyang Zhou, Guang Cheng, Zongyao Chen, Shui Yu

Experimental results on two Android malware datasets demonstrate that MalPurifier outperforms the state-of-the-art defenses, and it significantly strengthens the vulnerable malware detector against 37 evasion attacks, achieving accuracies over 90. 91%.

Android Malware Detection Denoising +2

AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing

1 code implementation24 Oct 2023 Namjoon Suh, Xiaofeng Lin, Din-Yin Hsieh, Merhdad Honarkhah, Guang Cheng

Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis.

Language Modelling Speech Synthesis +1

Improve Deep Forest with Learnable Layerwise Augmentation Policy Schedule

1 code implementation16 Sep 2023 Hongyu Zhu, Sichu Liang, Wentao Hu, Fang-Qi Li, Yali Yuan, Shi-Lin Wang, Guang Cheng

As a modern ensemble technique, Deep Forest (DF) employs a cascading structure to construct deep models, providing stronger representational power compared to traditional decision forests.

AutoML Data Augmentation +1

MissDiff: Training Diffusion Models on Tabular Data with Missing Values

no code implementations2 Jul 2023 Yidong Ouyang, Liyan Xie, Chongxuan Li, Guang Cheng

The diffusion model has shown remarkable performance in modeling data distributions and synthesizing data.

Denoising

Utility Theory of Synthetic Data Generation

no code implementations17 May 2023 SHIRONG XU, Will Wei Sun, Guang Cheng

The former is defined as the generalization difference between models trained on synthetic and on real data.

Synthetic Data Generation

Tight Non-asymptotic Inference via Sub-Gaussian Intrinsic Moment Norm

no code implementations13 Mar 2023 Huiming Zhang, Haoyu Wei, Guang Cheng

In non-asymptotic learning, variance-type parameters of sub-Gaussian distributions are of paramount importance.

Double Matching Under Complementary Preferences

no code implementations24 Jan 2023 Yuantong Li, Guang Cheng, Xiaowu Dai

In this paper, we propose a new algorithm for addressing the problem of matching markets with complementary preferences, where agents' preferences are unknown a priori and must be learned from data.

Thompson Sampling

Statistical Theory of Differentially Private Marginal-based Data Synthesis Algorithms

no code implementations21 Jan 2023 Ximing Li, Chendi Wang, Guang Cheng

To complete the picture, we establish a lower bound for TV accuracy that holds for every $\epsilon$-DP synthetic data generator.

Ranking Differential Privacy

no code implementations2 Jan 2023 SHIRONG XU, Will Wei Sun, Guang Cheng

This allows us to develop a multistage ranking algorithm to generate synthetic rankings while satisfying the developed $\epsilon$-ranking differential privacy.

Inference Attack

On the Utility Recovery Incapability of Neural Net-based Differential Private Tabular Training Data Synthesizer under Privacy Deregulation

no code implementations28 Nov 2022 Yucong Liu, Chi-Hua Wang, Guang Cheng

Devising procedures for auditing generative model privacy-utility tradeoff is an important yet unresolved problem in practice.

Improving Adversarial Robustness by Contrastive Guided Diffusion Process

no code implementations18 Oct 2022 Yidong Ouyang, Liyan Xie, Guang Cheng

Among various deep generative models, the diffusion model has been shown to produce high-quality synthetic images and has achieved good performance in improving the adversarial robustness.

Adversarial Robustness Synthetic Data Generation

Differentially Private Bootstrap: New Privacy Analysis and Inference Strategies

1 code implementation12 Oct 2022 Zhanyu Wang, Guang Cheng, Jordan Awan

Differentially private (DP) mechanisms protect individual-level information by introducing randomness into the statistical analysis procedure.

regression

Fair Bayes-Optimal Classifiers Under Predictive Parity

1 code implementation15 May 2022 Xianli Zeng, Edgar Dobriban, Guang Cheng

This paper considers predictive parity, which requires equalizing the probability of success given a positive prediction among different protected groups.

Rate-Optimal Contextual Online Matching Bandit

no code implementations7 May 2022 Yuantong Li, Chi-Hua Wang, Guang Cheng, Will Wei Sun

Existing works focus on multi-armed bandit with static preference, but this is insufficient: the two-sided preference changes as along as one-side's contextual information updates, resulting in non-static matching.

Federated Online Sparse Decision Making

no code implementations27 Feb 2022 Chi-Hua Wang, Wenjie Li, Guang Cheng, Guang Lin

This paper presents a novel federated linear contextual bandits model, where individual clients face different K-armed stochastic bandits with high-dimensional decision context and coupled through common global parameters.

Decision Making Multi-Armed Bandits

Enhanced Nearest Neighbor Classification for Crowdsourcing

no code implementations26 Feb 2022 Jiexin Duan, Xingye Qiao, Guang Cheng

In machine learning, crowdsourcing is an economical way to label a large amount of data.

Classification

Optimal Convergence Rates of Deep Convolutional Neural Networks: Additive Ridge Functions

no code implementations24 Feb 2022 Zhiying Fang, Guang Cheng

Convolutional neural networks have shown impressive abilities in many applications, especially those related to the classification tasks.

regression

Attention Enables Zero Approximation Error

no code implementations24 Feb 2022 Zhiying Fang, Yidong Ouyang, Ding-Xuan Zhou, Guang Cheng

In this work, we show that with suitable adaptations, the single-head self-attention transformer with a fixed number of transformer encoder blocks and free parameters is able to generate any desired polynomial of the input with no error.

Image Classification

Residual Bootstrap Exploration for Stochastic Linear Bandit

no code implementations23 Feb 2022 Shuang Wu, Chi-Hua Wang, Yuantong Li, Guang Cheng

We propose a new bootstrap-based online algorithm for stochastic linear bandit problems.

Computational Efficiency

Benefit of Interpolation in Nearest Neighbor Algorithms

no code implementations23 Feb 2022 Yue Xing, Qifan Song, Guang Cheng

In some studies \citep[e. g.,][]{zhang2016understanding} of deep learning, it is observed that over-parametrized deep neural networks achieve a small testing error even when the training error is almost zero.

Bayes-Optimal Classifiers under Group Fairness

1 code implementation20 Feb 2022 Xianli Zeng, Edgar Dobriban, Guang Cheng

Machine learning algorithms are becoming integrated into more and more high-stakes decision-making processes, such as in social welfare issues.

BIG-bench Machine Learning Decision Making +1

Unlabeled Data Help: Minimax Analysis and Adversarial Robustness

no code implementations14 Feb 2022 Yue Xing, Qifan Song, Guang Cheng

The recent proposed self-supervised learning (SSL) approaches successfully demonstrate the great potential of supplementing learning algorithms with additional unlabeled data.

Adversarial Robustness Self-Supervised Learning

High-Dimensional Inference over Networks: Linear Convergence and Statistical Guarantees

no code implementations21 Jan 2022 Ying Sun, Marie Maros, Gesualdo Scutari, Guang Cheng

Our theory shows that, under standard notions of restricted strong convexity and smoothness of the loss functions, suitable conditions on the network connectivity and algorithm tuning, the distributed algorithm converges globally at a {\it linear} rate to an estimate that is within the centralized {\it statistical precision} of the model, $O(s\log d/N)$.

Vocal Bursts Intensity Prediction

On the Algorithmic Stability of Adversarial Training

no code implementations NeurIPS 2021 Yue Xing, Qifan Song, Guang Cheng

In contrast, this paper studies the algorithmic stability of a generic adversarial training algorithm, which can further help to establish an upper bound for generalization error.

Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning

no code implementations8 Aug 2021 Pratik Ramprasad, Yuantong Li, Zhuoran Yang, Zhaoran Wang, Will Wei Sun, Guang Cheng

The recent emergence of reinforcement learning has created a demand for robust statistical inference methods for the parameter estimates computed using these algorithms.

reinforcement-learning Reinforcement Learning (RL)

Optimum-statistical Collaboration Towards General and Efficient Black-box Optimization

1 code implementation17 Jun 2021 Wenjie Li, Chi-Hua Wang, Guang Cheng, Qifan Song

In this paper, we make the key delineation on the roles of resolution and statistical uncertainty in hierarchical bandits-based black-box optimization algorithms, guiding a more general analysis and a more efficient algorithm design.

Distributed Bootstrap for Simultaneous Inference Under High Dimensionality

1 code implementation19 Feb 2021 Yang Yu, Shih-Kang Chao, Guang Cheng

We propose a distributed bootstrap method for simultaneous inference on high-dimensional massive data that are stored and processed with many machines.

Vocal Bursts Intensity Prediction

On the Marginal Regret Bound Minimization of Adaptive Methods

no code implementations1 Jan 2021 Wenjie Li, Guang Cheng

Numerous adaptive algorithms such as AMSGrad and Radam have been proposed and applied to deep learning recently.

Open-Ended Question Answering

Variance Reduction on General Adaptive Stochastic Mirror Descent

no code implementations26 Dec 2020 Wenjie Li, Zhanyu Wang, Yichen Zhang, Guang Cheng

In this work, we investigate the idea of variance reduction by studying its properties with general adaptive mirror descent algorithms in nonsmooth nonconvex finite-sum optimization problems.

Adversarially Robust Estimate and Risk Analysis in Linear Regression

no code implementations18 Dec 2020 Yue Xing, Ruizhi Zhang, Guang Cheng

Further, we reveal an explicit connection of adversarial and standard estimates, and propose a straightforward two-stage adversarial learning framework, which facilitates to utilize model structure information to improve adversarial robustness.

Adversarial Robustness regression

Online Forgetting Process for Linear Regression Models

no code implementations3 Dec 2020 Yuantong Li, Chi-Hua Wang, Guang Cheng

Motivated by the EU's "Right To Be Forgotten" regulation, we initiate a study of statistical data deletion problems where users' data are accessible only for a limited period of time.

regression

Statistical Guarantees of Distributed Nearest Neighbor Classification

no code implementations NeurIPS 2020 Jiexin Duan, Xingye Qiao, Guang Cheng

It is interesting to note that the weighted voting scheme allows a larger number of subsamples than the majority voting one.

Classification General Classification

Efficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee

1 code implementation NeurIPS 2020 Jincheng Bai, Qifan Song, Guang Cheng

Sparse deep learning aims to address the challenge of huge storage consumption by deep neural networks, and to recover the sparse structure of target functions.

Uncertainty Quantification Variable Selection +1

Nearly Optimal Variational Inference for High Dimensional Regression with Shrinkage Priors

no code implementations24 Oct 2020 Jincheng Bai, Qifan Song, Guang Cheng

We propose a variational Bayesian (VB) procedure for high-dimensional linear model inferences with heavy tail shrinkage priors, such as student-t prior.

Computational Efficiency regression +3

On the Generalization Properties of Adversarial Training

no code implementations15 Aug 2020 Yue Xing, Qifan Song, Guang Cheng

Modern machine learning and deep learning models are shown to be vulnerable when testing data are slightly perturbed.

Adversarial Robustness

Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

no code implementations6 Jul 2020 Tianyang Hu, Wenjia Wang, Cong Lin, Guang Cheng

Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data.

Online Regularization towards Always-Valid High-Dimensional Dynamic Pricing

no code implementations5 Jul 2020 Chi-Hua Wang, Zhanyu Wang, Will Wei Sun, Guang Cheng

In this paper, we propose a novel approach for designing dynamic pricing policy based regularized online statistical learning with theoretical guarantees.

valid Vocal Bursts Intensity Prediction

Directional Pruning of Deep Neural Networks

1 code implementation NeurIPS 2020 Shih-Kang Chao, Zhanyu Wang, Yue Xing, Guang Cheng

In the light of the fact that the stochastic gradient descent (SGD) often finds a flat minimum valley in the training loss, we propose a novel directional pruning method which searches for a sparse minimizer in or close to that flat region.

On Deep Instrumental Variables Estimate

no code implementations30 Apr 2020 Ruiqi Liu, Zuofeng Shang, Guang Cheng

The endogeneity issue is fundamentally important as many empirical applications may suffer from the omission of explanatory variables, measurement error, or simultaneous causality.

Online Batch Decision-Making with High-Dimensional Covariates

no code implementations21 Feb 2020 Chi-Hua Wang, Guang Cheng

In such a scenario, our goal is to allocate a batch of treatments to maximize treatment efficacy based on observed high-dimensional user covariates.

Decision Making Marketing +1

Residual Bootstrap Exploration for Bandit Algorithms

no code implementations19 Feb 2020 Chi-Hua Wang, Yang Yu, Botao Hao, Guang Cheng

In this paper, we propose a novel perturbation-based exploration method in bandit algorithms with bounded or unbounded rewards, called residual bootstrap exploration (\texttt{ReBoot}).

Computational Efficiency Multi-Armed Bandits +1

Simultaneous Inference for Massive Data: Distributed Bootstrap

no code implementations ICML 2020 Yang Yu, Shih-Kang Chao, Guang Cheng

In this paper, we propose a bootstrap method applied to massive data processed distributedly in a large number of machines.

Predictive Power of Nearest Neighbors Algorithm under Random Perturbation

no code implementations13 Feb 2020 Yue Xing, Qifan Song, Guang Cheng

We consider a data corruption scenario in the classical $k$ Nearest Neighbors ($k$-NN) algorithm, that is, the testing data are randomly perturbed.

Sharp Rate of Convergence for Deep Neural Network Classifiers under the Teacher-Student Setting

no code implementations19 Jan 2020 Tianyang Hu, Zuofeng Shang, Guang Cheng

In this paper, we attempt to understand this empirical success in high dimensional classification by deriving the convergence rates of excess risk.

General Classification

Benefit of Interpolation in Nearest Neighbor Algorithms

no code implementations25 Sep 2019 Yue Xing, Qifan Song, Guang Cheng

The over-parameterized models attract much attention in the era of data science and deep learning.

A generalization of regularized dual averaging and its dynamics

no code implementations22 Sep 2019 Shih-Kang Chao, Guang Cheng

Preliminary empirical analysis of modern image data shows that learning very sparse deep neural networks by gRDA does not necessarily sacrifice testing accuracy.

Machine Learning in/for Blockchain: Future and Challenges

no code implementations12 Sep 2019 Fang Chen, Hong Wan, Hua Cai, Guang Cheng

Machine learning and blockchain are two of the most noticeable technologies in recent years.

BIG-bench Machine Learning

Bootstrapping Upper Confidence Bound

no code implementations NeurIPS 2019 Botao Hao, Yasin Abbasi-Yadkori, Zheng Wen, Guang Cheng

Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback.

Decision Making Multi-Armed Bandits

Building an Efficient Intrusion Detection System Based on Feature Selection and Ensemble Classifier

no code implementations2 Apr 2019 Yuyang Zhou, Guang Cheng, Shanqing Jiang, Mian Dai

Intrusion detection system (IDS) is one of extensively used techniques in a network topology to safeguard the integrity and availability of sensitive assets in the protected systems.

Anomaly Detection Dimensionality Reduction +4

Stein Neural Sampler

1 code implementation8 Oct 2018 Tianyang Hu, Zixiang Chen, Hanxi Sun, Jincheng Bai, Mao Ye, Guang Cheng

We propose two novel samplers to generate high-quality samples from a given (un-normalized) probability density.

Statistically and Computationally Efficient Variance Estimator for Kernel Ridge Regression

no code implementations17 Sep 2018 Meimei Liu, Jean Honorio, Guang Cheng

In this paper, we propose a random projection approach to estimate variance in kernel ridge regression.

regression

How Many Machines Can We Use in Parallel Computing for Kernel Ridge Regression?

no code implementations25 May 2018 Meimei Liu, Zuofeng Shang, Guang Cheng

It is worth noting that the upper bounds of the number of machines are proven to be un-improvable (upto a logarithmic factor) in two important cases: smoothing spline regression and Gaussian RKHS regression.

regression Two-sample testing

Early Stopping for Nonparametric Testing

no code implementations NeurIPS 2018 Meimei Liu, Guang Cheng

Early stopping of iterative algorithms is an algorithmic regularization method to avoid over-fitting in estimation and classification.

General Classification

Nonparametric Testing under Random Projection

no code implementations17 Feb 2018 Meimei Liu, Zuofeng Shang, Guang Cheng

A common challenge in nonparametric inference is its high computational complexity when data volume is large.

regression

Sparse and Low-rank Tensor Estimation via Cubic Sketchings

no code implementations29 Jan 2018 Botao Hao, Anru Zhang, Guang Cheng

In this paper, we propose a general framework for sparse and low-rank tensor estimation from cubic sketchings.

regression Tensor Decomposition

Stability Enhanced Large-Margin Classifier Selection

no code implementations20 Jan 2017 Will Wei Sun, Guang Cheng, Yufeng Liu

Stability is an important aspect of a classification procedure because unstable predictions can potentially reduce users' trust in a classification system and also harm the reproducibility of scientific conclusions.

Classification General Classification

Simultaneous Clustering and Estimation of Heterogeneous Graphical Models

no code implementations28 Nov 2016 Botao Hao, Will Wei Sun, Yufeng Liu, Guang Cheng

We consider joint estimation of multiple graphical models arising from heterogeneous and high-dimensional observations.

Clustering Sparse Learning

Tensor Graphical Model: Non-convex Optimization and Statistical Inference

no code implementations15 Sep 2016 Xiang Lyu, Will Wei Sun, Zhaoran Wang, Han Liu, Jian Yang, Guang Cheng

We consider the estimation and inference of graphical models that characterize the dependency structure of high-dimensional tensor-valued data.

Computational Limits of A Distributed Algorithm For Smoothing Spline

no code implementations31 Dec 2015 Zuofeng Shang, Guang Cheng

In this paper, we explore statistical versus computational trade-off to address a basic question in the application of a distributed algorithm: what is the minimal computational cost in obtaining statistical optimality?

Statistics Theory Statistics Theory

Non-convex Statistical Optimization for Sparse Tensor Graphical Model

no code implementations NeurIPS 2015 Wei Sun, Zhaoran Wang, Han Liu, Guang Cheng

We consider the estimation of sparse graphical models that characterize the dependency structure of high-dimensional tensor-valued data.

Provable Sparse Tensor Decomposition

no code implementations5 Feb 2015 Will Wei Sun, Junwei Lu, Han Liu, Guang Cheng

We propose a novel sparse tensor decomposition method, namely Tensor Truncated Power (TTP) method, that incorporates variable selection into the estimation of decomposition components.

Click-Through Rate Prediction Clustering +2

Stabilized Nearest Neighbor Classifier and Its Statistical Properties

no code implementations26 May 2014 Wei Sun, Xingye Qiao, Guang Cheng

In this paper, we introduce a general measure of classification instability (CIS) to quantify the sampling variability of the prediction made by a classification method.

Classification General Classification

Local and global asymptotic inference in smoothing spline models

no code implementations30 Dec 2012 Zuofeng Shang, Guang Cheng

In particular, our confidence intervals are proved to be asymptotically valid at any point in the support, and they are shorter on average than the Bayesian confidence intervals proposed by Wahba [J. R. Stat.

Math valid

Cannot find the paper you are looking for? You can Submit a new open access paper.