no code implementations • ICML 2020 • Ching-Wei Cheng, Xingye Qiao, Guang Cheng
In this article, we study a new paradigm called mutual transfer learning where among many heterogeneous data domains, every data domain could potentially be the target of interest, and it could also be a useful source to help the learning in other data domains.
no code implementations • 16 Nov 2024 • Bochao Gu, Hengzhi He, Guang Cheng
In this paper, we propose a novel statistical framework for watermarking generative categorical data.
no code implementations • 31 Oct 2024 • Tung Sum Thomas Kwok, Chi-Hua Wang, Guang Cheng
Data collaboration via Data Clean Room offers value but raises privacy concerns, which can be addressed through synthetic data and multi-table synthesizers.
no code implementations • 12 Oct 2024 • Zhangjie Xia, Chi-Hua Wang, Guang Cheng
We therefore present the perfect deleted point problem for 1-step noisy SGD in the classical linear regression task, which aims to find the perfect deleted point in the training dataset such that the model resulted from the deleted dataset will be identical to the one trained without deleting it.
1 code implementation • 23 Jun 2024 • Namjoon Suh, Yuning Yang, Din-Yin Hsieh, Qitong Luan, SHIRONG XU, Shixiang Zhu, Guang Cheng
In this paper, we leverage the power of latent diffusion models to generate synthetic time series tabular data.
no code implementations • 19 Jun 2024 • Yu Xia, Chi-Hua Wang, Joshua Mabry, Guang Cheng
This paper introduces a comprehensive framework for assessing synthetic retail data, focusing on fidelity, utility, and privacy.
no code implementations • 18 Jun 2024 • Joshua Ward, Chi-Hua Wang, Guang Cheng
The promise of tabular generative models is to produce realistic synthetic data that can be shared and safely used without dangerous leakage of information from the training set.
no code implementations • 7 Jun 2024 • Xiaofeng Lin, Chenheng Xu, Matthew Yang, Guang Cheng
Generative Foundation Models (GFMs) have produced synthetic data with remarkable quality in modalities such as images and text.
no code implementations • 4 Jun 2024 • Yuantong Li, Guang Cheng, Xiaowu Dai
Recommender systems play a crucial role in internet economies by connecting users with relevant products or services.
no code implementations • 27 May 2024 • Peiyu Yu, Dinghuai Zhang, Hengzhi He, Xiaojian Ma, Ruiyao Miao, Yifan Lu, Yasi Zhang, Deqian Kong, Ruiqi Gao, Jianwen Xie, Guang Cheng, Ying Nian Wu
To this end, we formulate an learnable energy-based latent space, and propose Noise-intensified Telescoping density-Ratio Estimation (NTRE) scheme for variational learning of an accurate latent space model without costly Markov Chain Monte Carlo.
no code implementations • 27 May 2024 • Yidong Ouyang, Liyan Xie, Hongyuan Zha, Guang Cheng
Consequently, various finetuning and regularization approaches have been proposed to transfer knowledge from existing pre-trained models to specific target domains with limited data.
no code implementations • 24 May 2024 • Lan Tao, SHIRONG XU, Chi-Hua Wang, Namjoon Suh, Guang Cheng
In particular, this paper establishes theoretical results regarding the convergence rate of the estimation error of TV distance between two Gaussian distributions.
no code implementations • 24 May 2024 • Chi-Hua Wang, Guang Cheng
We present BadGD, a unified theoretical framework that exposes the vulnerabilities of gradient descent algorithms through strategic backdoor attacks.
no code implementations • 5 May 2024 • Zhaiming Shen, Menglun Wang, Guang Cheng, Ming-Jun Lai, Lin Mu, Ruihao Huang, Qi Liu, Hao Zhu
In this paper, we propose TOOD detection, a simple yet effective tree-based out-of-distribution (TOOD) detection mechanism to determine if a set of unseen samples will have similar distribution as of the training samples.
1 code implementation • 27 Mar 2024 • Xianli Zeng, Guang Cheng, Edgar Dobriban
Mitigating the disparate impact of statistical machine learning methods is crucial for ensuring fairness.
no code implementations • 18 Mar 2024 • Tian-Yi Zhou, Namjoon Suh, Guang Cheng, Xiaoming Huo
Motivated by the abundance of functional data such as time series and images, there has been a growing interest in integrating such data into neural networks and learning maps from function spaces to R (i. e., functionals).
1 code implementation • 12 Mar 2024 • Xianli Zeng, Joshua Ward, Guang Cheng
The increasing usage of machine learning models in consequential decision-making processes has spurred research into the fairness of these systems.
no code implementations • 26 Feb 2024 • SHIRONG XU, Will Wei Sun, Guang Cheng
Motivated by this, we propose to adaptively debiasing the rankings from the randomized response mechanism, ensuring consistent estimation of true preferences and enhancing the utility of downstream rank aggregation.
1 code implementation • 5 Feb 2024 • Xianli Zeng, Guang Cheng, Edgar Dobriban
To address this, we develop methods for Bayes-optimal fair classification, aiming to minimize classification error subject to given group fairness constraints.
no code implementations • 1 Feb 2024 • Yue Xing, Xiaofeng Lin, Chenheng Xu, Namjoon Suh, Qifan Song, Guang Cheng
We observe that (1) a transformer with two layers of (self-)attentions with a look-ahead attention mask can learn from the prompt in the unstructured data, and (2) positional encoding can match the $x_i$ and $y_i$ tokens to achieve a better ICL performance.
no code implementations • 26 Jan 2024 • Yue Xing, Xiaofeng Lin, Qifan Song, Yi Xu, Belinda Zeng, Guang Cheng
Pre-training is known to generate universal representations for downstream tasks in large-scale deep learning such as large language models.
no code implementations • 14 Jan 2024 • Namjoon Suh, Guang Cheng
In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics and generative models.
no code implementations • 1 Jan 2024 • Yinan Cheng, Chi-Hua Wang, Vamsi K. Potluru, Tucker Balch, Guang Cheng
Devising procedures for downstream task-oriented generative model selections is an unresolved problem of practical importance.
no code implementations • 1 Jan 2024 • Din-Yin Hsieh, Chi-Hua Wang, Guang Cheng
Exploring generative model training for synthetic tabular data, specifically in sequential contexts such as credit card transaction data, presents significant challenges.
1 code implementation • 11 Dec 2023 • Yuyang Zhou, Guang Cheng, Zongyao Chen, Shui Yu
Experimental results on two Android malware datasets demonstrate that MalPurifier outperforms the state-of-the-art defenses, and it significantly strengthens the vulnerable malware detector against 37 evasion attacks, achieving accuracies over 90. 91%.
1 code implementation • 24 Oct 2023 • Namjoon Suh, Xiaofeng Lin, Din-Yin Hsieh, Merhdad Honarkhah, Guang Cheng
Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis.
1 code implementation • 16 Sep 2023 • Hongyu Zhu, Sichu Liang, Wentao Hu, Fang-Qi Li, Yali Yuan, Shi-Lin Wang, Guang Cheng
As a modern ensemble technique, Deep Forest (DF) employs a cascading structure to construct deep models, providing stronger representational power compared to traditional decision forests.
no code implementations • 2 Jul 2023 • Yidong Ouyang, Liyan Xie, Chongxuan Li, Guang Cheng
The diffusion model has shown remarkable performance in modeling data distributions and synthesizing data.
no code implementations • 17 May 2023 • SHIRONG XU, Will Wei Sun, Guang Cheng
The former is defined as the generalization difference between models trained on synthetic and on real data.
no code implementations • 13 Mar 2023 • Huiming Zhang, Haoyu Wei, Guang Cheng
In non-asymptotic learning, variance-type parameters of sub-Gaussian distributions are of paramount importance.
1 code implementation • 24 Jan 2023 • Yuantong Li, Guang Cheng, Xiaowu Dai
In this paper, we propose a new recommendation algorithm for addressing the problem of two-sided online matching markets with complementary preferences and quota constraints, where agents' preferences are unknown a priori and must be learned from data.
no code implementations • 21 Jan 2023 • Ximing Li, Chendi Wang, Guang Cheng
To complete the picture, we establish a lower bound for TV accuracy that holds for every $\epsilon$-DP synthetic data generator.
no code implementations • 2 Jan 2023 • SHIRONG XU, Will Wei Sun, Guang Cheng
This allows us to develop a multistage ranking algorithm to generate synthetic rankings while satisfying the developed $\epsilon$-ranking differential privacy.
no code implementations • 28 Nov 2022 • Yucong Liu, Chi-Hua Wang, Guang Cheng
Devising procedures for auditing generative model privacy-utility tradeoff is an important yet unresolved problem in practice.
no code implementations • 18 Oct 2022 • Yidong Ouyang, Liyan Xie, Guang Cheng
Among various deep generative models, the diffusion model has been shown to produce high-quality synthetic images and has achieved good performance in improving the adversarial robustness.
1 code implementation • 12 Oct 2022 • Zhanyu Wang, Guang Cheng, Jordan Awan
For the composition of the DP bootstrap, we present a numerical method to compute the exact privacy cost of releasing multiple DP bootstrap estimates, and using the Gaussian-DP (GDP) framework (Dong et al., 2022), we show that the release of $B$ DP bootstrap estimates from mechanisms satisfying $(\mu/\sqrt{(2-2/\mathrm{e})B})$-GDP asymptotically satisfies $\mu$-GDP as $B$ goes to infinity.
1 code implementation • 15 May 2022 • Xianli Zeng, Edgar Dobriban, Guang Cheng
This paper considers predictive parity, which requires equalizing the probability of success given a positive prediction among different protected groups.
no code implementations • 7 May 2022 • Yuantong Li, Chi-Hua Wang, Guang Cheng, Will Wei Sun
The key component of the proposed dynamic matching algorithm is an online estimation of the preference ranking with a statistical guarantee.
no code implementations • 27 Feb 2022 • Chi-Hua Wang, Wenjie Li, Guang Cheng, Guang Lin
This paper presents a novel federated linear contextual bandits model, where individual clients face different K-armed stochastic bandits with high-dimensional decision context and coupled through common global parameters.
no code implementations • 26 Feb 2022 • Jiexin Duan, Xingye Qiao, Guang Cheng
In machine learning, crowdsourcing is an economical way to label a large amount of data.
no code implementations • 24 Feb 2022 • Zhiying Fang, Yidong Ouyang, Ding-Xuan Zhou, Guang Cheng
In this work, we show that with suitable adaptations, the single-head self-attention transformer with a fixed number of transformer encoder blocks and free parameters is able to generate any desired polynomial of the input with no error.
no code implementations • 24 Feb 2022 • Zhiying Fang, Guang Cheng
Convolutional neural networks have shown impressive abilities in many applications, especially those related to the classification tasks.
no code implementations • 23 Feb 2022 • Shuang Wu, Chi-Hua Wang, Yuantong Li, Guang Cheng
We propose a new bootstrap-based online algorithm for stochastic linear bandit problems.
no code implementations • 23 Feb 2022 • Yue Xing, Qifan Song, Guang Cheng
In some studies \citep[e. g.,][]{zhang2016understanding} of deep learning, it is observed that over-parametrized deep neural networks achieve a small testing error even when the training error is almost zero.
1 code implementation • 20 Feb 2022 • Xianli Zeng, Edgar Dobriban, Guang Cheng
Machine learning algorithms are becoming integrated into more and more high-stakes decision-making processes, such as in social welfare issues.
no code implementations • 14 Feb 2022 • Yue Xing, Qifan Song, Guang Cheng
The recent proposed self-supervised learning (SSL) approaches successfully demonstrate the great potential of supplementing learning algorithms with additional unlabeled data.
no code implementations • 21 Jan 2022 • Ying Sun, Marie Maros, Gesualdo Scutari, Guang Cheng
Our theory shows that, under standard notions of restricted strong convexity and smoothness of the loss functions, suitable conditions on the network connectivity and algorithm tuning, the distributed algorithm converges globally at a {\it linear} rate to an estimate that is within the centralized {\it statistical precision} of the model, $O(s\log d/N)$.
no code implementations • NeurIPS 2021 • Yue Xing, Qifan Song, Guang Cheng
In contrast, this paper studies the algorithmic stability of a generic adversarial training algorithm, which can further help to establish an upper bound for generalization error.
no code implementations • 8 Aug 2021 • Pratik Ramprasad, Yuantong Li, Zhuoran Yang, Zhaoran Wang, Will Wei Sun, Guang Cheng
The recent emergence of reinforcement learning has created a demand for robust statistical inference methods for the parameter estimates computed using these algorithms.
1 code implementation • 17 Jun 2021 • Wenjie Li, Chi-Hua Wang, Guang Cheng, Qifan Song
In this paper, we make the key delineation on the roles of resolution and statistical uncertainty in hierarchical bandits-based black-box optimization algorithms, guiding a more general analysis and a more efficient algorithm design.
1 code implementation • 19 Feb 2021 • Yang Yu, Shih-Kang Chao, Guang Cheng
We propose a distributed bootstrap method for simultaneous inference on high-dimensional massive data that are stored and processed with many machines.
no code implementations • 1 Jan 2021 • Wenjie Li, Guang Cheng
Numerous adaptive algorithms such as AMSGrad and Radam have been proposed and applied to deep learning recently.
no code implementations • 26 Dec 2020 • Wenjie Li, Zhanyu Wang, Yichen Zhang, Guang Cheng
In this work, we investigate the idea of variance reduction by studying its properties with general adaptive mirror descent algorithms in nonsmooth nonconvex finite-sum optimization problems.
no code implementations • 18 Dec 2020 • Yue Xing, Ruizhi Zhang, Guang Cheng
Further, we reveal an explicit connection of adversarial and standard estimates, and propose a straightforward two-stage adversarial learning framework, which facilitates to utilize model structure information to improve adversarial robustness.
no code implementations • 3 Dec 2020 • Yuantong Li, Chi-Hua Wang, Guang Cheng
Motivated by the EU's "Right To Be Forgotten" regulation, we initiate a study of statistical data deletion problems where users' data are accessible only for a limited period of time.
no code implementations • NeurIPS 2020 • Jiexin Duan, Xingye Qiao, Guang Cheng
It is interesting to note that the weighted voting scheme allows a larger number of subsamples than the majority voting one.
1 code implementation • NeurIPS 2020 • Jincheng Bai, Qifan Song, Guang Cheng
Sparse deep learning aims to address the challenge of huge storage consumption by deep neural networks, and to recover the sparse structure of target functions.
no code implementations • 24 Oct 2020 • Jincheng Bai, Qifan Song, Guang Cheng
We propose a variational Bayesian (VB) procedure for high-dimensional linear model inferences with heavy tail shrinkage priors, such as student-t prior.
no code implementations • 15 Aug 2020 • Yue Xing, Qifan Song, Guang Cheng
Modern machine learning and deep learning models are shown to be vulnerable when testing data are slightly perturbed.
no code implementations • 6 Jul 2020 • Tianyang Hu, Wenjia Wang, Cong Lin, Guang Cheng
Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data.
no code implementations • 5 Jul 2020 • Chi-Hua Wang, Zhanyu Wang, Will Wei Sun, Guang Cheng
In this paper, we propose a novel approach for designing dynamic pricing policy based regularized online statistical learning with theoretical guarantees.
1 code implementation • NeurIPS 2020 • Shih-Kang Chao, Zhanyu Wang, Yue Xing, Guang Cheng
In the light of the fact that the stochastic gradient descent (SGD) often finds a flat minimum valley in the training loss, we propose a novel directional pruning method which searches for a sparse minimizer in or close to that flat region.
no code implementations • 30 Apr 2020 • Ruiqi Liu, Zuofeng Shang, Guang Cheng
The endogeneity issue is fundamentally important as many empirical applications may suffer from the omission of explanatory variables, measurement error, or simultaneous causality.
no code implementations • 21 Feb 2020 • Chi-Hua Wang, Guang Cheng
In such a scenario, our goal is to allocate a batch of treatments to maximize treatment efficacy based on observed high-dimensional user covariates.
no code implementations • ICML 2020 • Yang Yu, Shih-Kang Chao, Guang Cheng
In this paper, we propose a bootstrap method applied to massive data processed distributedly in a large number of machines.
no code implementations • 19 Feb 2020 • Chi-Hua Wang, Yang Yu, Botao Hao, Guang Cheng
In this paper, we propose a novel perturbation-based exploration method in bandit algorithms with bounded or unbounded rewards, called residual bootstrap exploration (\texttt{ReBoot}).
no code implementations • 13 Feb 2020 • Yue Xing, Qifan Song, Guang Cheng
We consider a data corruption scenario in the classical $k$ Nearest Neighbors ($k$-NN) algorithm, that is, the testing data are randomly perturbed.
no code implementations • 19 Jan 2020 • Tianyang Hu, Zuofeng Shang, Guang Cheng
In this paper, we attempt to understand this empirical success in high dimensional classification by deriving the convergence rates of excess risk.
no code implementations • 25 Sep 2019 • Yue Xing, Qifan Song, Guang Cheng
The over-parameterized models attract much attention in the era of data science and deep learning.
no code implementations • 22 Sep 2019 • Shih-Kang Chao, Guang Cheng
Preliminary empirical analysis of modern image data shows that learning very sparse deep neural networks by gRDA does not necessarily sacrifice testing accuracy.
no code implementations • 12 Sep 2019 • Fang Chen, Hong Wan, Hua Cai, Guang Cheng
Machine learning and blockchain are two of the most noticeable technologies in recent years.
1 code implementation • NeurIPS 2019 • Xingye Qiao, Jiexin Duan, Guang Cheng
Nearest neighbor is a popular class of classification methods with many desirable properties.
no code implementations • NeurIPS 2019 • Botao Hao, Yasin Abbasi-Yadkori, Zheng Wen, Guang Cheng
Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback.
no code implementations • 2 Apr 2019 • Yuyang Zhou, Guang Cheng, Shanqing Jiang, Mian Dai
Intrusion detection system (IDS) is one of extensively used techniques in a network topology to safeguard the integrity and availability of sensitive assets in the protected systems.
1 code implementation • 8 Oct 2018 • Tianyang Hu, Zixiang Chen, Hanxi Sun, Jincheng Bai, Mao Ye, Guang Cheng
We propose two novel samplers to generate high-quality samples from a given (un-normalized) probability density.
no code implementations • 5 Oct 2018 • Yue Xing, Qifan Song, Guang Cheng
In the era of deep learning, understanding over-fitting phenomenon becomes increasingly important.
no code implementations • 17 Sep 2018 • Meimei Liu, Jean Honorio, Guang Cheng
In this paper, we propose a random projection approach to estimate variance in kernel ridge regression.
no code implementations • ICML 2018 • Ganggang Xu, Zuofeng Shang, Guang Cheng
Divide-and-conquer is a powerful approach for large and massive data analysis.
no code implementations • 25 May 2018 • Meimei Liu, Zuofeng Shang, Guang Cheng
It is worth noting that the upper bounds of the number of machines are proven to be un-improvable (upto a logarithmic factor) in two important cases: smoothing spline regression and Gaussian RKHS regression.
no code implementations • NeurIPS 2018 • Meimei Liu, Guang Cheng
Early stopping of iterative algorithms is an algorithmic regularization method to avoid over-fitting in estimation and classification.
no code implementations • 17 Feb 2018 • Meimei Liu, Zuofeng Shang, Guang Cheng
A common challenge in nonparametric inference is its high computational complexity when data volume is large.
no code implementations • 29 Jan 2018 • Botao Hao, Anru Zhang, Guang Cheng
In this paper, we propose a general framework for sparse and low-rank tensor estimation from cubic sketchings.
no code implementations • 20 Jan 2017 • Will Wei Sun, Guang Cheng, Yufeng Liu
Stability is an important aspect of a classification procedure because unstable predictions can potentially reduce users' trust in a classification system and also harm the reproducibility of scientific conclusions.
no code implementations • ICML 2018 • Ganggang Xu, Zuofeng Shang, Guang Cheng
Tuning parameter selection is of critical importance for kernel ridge regression.
no code implementations • 28 Nov 2016 • Botao Hao, Will Wei Sun, Yufeng Liu, Guang Cheng
We consider joint estimation of multiple graphical models arising from heterogeneous and high-dimensional observations.
no code implementations • 15 Sep 2016 • Xiang Lyu, Will Wei Sun, Zhaoran Wang, Han Liu, Jian Yang, Guang Cheng
We consider the estimation and inference of graphical models that characterize the dependency structure of high-dimensional tensor-valued data.
no code implementations • 31 Dec 2015 • Zuofeng Shang, Guang Cheng
In this paper, we explore statistical versus computational trade-off to address a basic question in the application of a distributed algorithm: what is the minimal computational cost in obtaining statistical optimality?
Statistics Theory Statistics Theory
no code implementations • NeurIPS 2015 • Wei Sun, Zhaoran Wang, Han Liu, Guang Cheng
We consider the estimation of sparse graphical models that characterize the dependency structure of high-dimensional tensor-valued data.
no code implementations • 5 Feb 2015 • Will Wei Sun, Junwei Lu, Han Liu, Guang Cheng
We propose a novel sparse tensor decomposition method, namely Tensor Truncated Power (TTP) method, that incorporates variable selection into the estimation of decomposition components.
no code implementations • CVPR 2014 • Yuanxiang Wang, Hesamoddin Salehian, Guang Cheng, Baba C. Vemuri
In this paper, we propose a new intrinsic recursive filter on the product manifold of shape and orientation.
no code implementations • 26 May 2014 • Wei Sun, Xingye Qiao, Guang Cheng
In this paper, we introduce a general measure of classification instability (CIS) to quantify the sampling variability of the prediction made by a classification method.
no code implementations • 30 Dec 2012 • Zuofeng Shang, Guang Cheng
In particular, our confidence intervals are proved to be asymptotically valid at any point in the support, and they are shorter on average than the Bayesian confidence intervals proposed by Wahba [J. R. Stat.