no code implementations • Findings (ACL) 2022 • Hao Cheng, Zhihua Zhang
The Conditional Masked Language Model (CMLM) is a strong baseline of NAT.
no code implementations • ICML 2020 • Guangzeng Xie, Luo Luo, Yijiang Lian, Zhihua Zhang
This paper studies the lower bound complexity for minimax optimization problem whose objective function is the average of $n$ individual smooth convex-concave functions.
no code implementations • 19 Oct 2024 • Chuhan Xie, Kaicheng Jin, Jiadong Liang, Zhihua Zhang
We study time-uniform statistical inference for parameters in stochastic approximation (SA), which encompasses a bunch of applications in optimization and machine learning.
no code implementations • 7 May 2024 • Hao Jin, Yang Peng, Liangyu Zhang, Zhihua Zhang
In face of the difference among restricted regions, we firstly introduce concepts of leakage probabilities to understand how such heterogeneity affects the learning process, and then propose a novel communication protocol that we call Federated-Q protocol (FedQ), which periodically aggregates agents' knowledge of their restricted regions and accordingly modifies their learning problems for further training.
no code implementations • 6 May 2024 • Hao Jin, Liangyu Zhang, Zhihua Zhang
In our setting, we aim to solve a reinforcement learning problem with multiple constraints while $N$ training agents are located in $N$ different environments with limited access to the constraint signals and they are expected to collaboratively learn a policy satisfying all constraint signals.
no code implementations • 9 Mar 2024 • Yang Peng, Liangyu Zhang, Zhihua Zhang
In the tabular case, \citet{rowland2018analysis} and \citet{rowland2023analysis} proved the asymptotic convergence of two instances of distributional TD, namely categorical temporal difference learning (CTD) and quantile temporal difference learning (QTD), respectively.
no code implementations • 8 Jan 2024 • Yuze Han, Xiang Li, Zhihua Zhang
In two-time-scale stochastic approximation (SA), two iterates are updated at varying speeds using different step sizes, with each update influencing the other.
no code implementations • 28 Oct 2023 • Boya Zhang, Weijian Luo, Zhihua Zhang
Based on our findings, we propose Purify++, a new diffusion purification algorithm that is now the state-of-the-art purification method against several adversarial attacks.
1 code implementation • 29 Sep 2023 • Liangyu Zhang, Yang Peng, Jiadong Liang, Wenhao Yang, Zhihua Zhang
This implies the distributional policy evaluation problem can be solved with sample efficiency.
Distributional Reinforcement Learning reinforcement-learning +1
no code implementations • 28 Sep 2023 • Yuhang Zhang, Yue Liu, Zhihua Zhang
Motivated by the synthetic control method, we construct a synthetic treatment group for the target population by a weighted mixture of treatment groups of source populations.
1 code implementation • NeurIPS 2023 • Boya Zhang, Weijian Luo, Zhihua Zhang
However, existing diffusion-based defenses rely on the sequential simulation of the reversed stochastic differential equations of diffusion models, which are computationally inefficient and yield suboptimal results.
no code implementations • 4 Jul 2023 • Weijian Luo, Hao Jiang, Tianyang Hu, Jiacheng Sun, Zhenguo Li, Zhihua Zhang
In image generation experiments, the proposed DCD is capable of training an energy-based model for generating the Celab-A $32\times 32$ dataset, which is comparable to existing EBMs.
no code implementations • 8 Jun 2023 • Weijian Luo, Boya Zhang, Zhihua Zhang
These benchmarks include sampling from 2D targets, Bayesian inference, and sampling from high-dimensional energy-based models (EBMs).
1 code implementation • NeurIPS 2023 • Weijian Luo, Tianyang Hu, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, Zhihua Zhang
To demonstrate the effectiveness and universality of Diff-Instruct, we consider two scenarios: distilling pre-trained diffusion models and refining existing GAN models.
no code implementations • 3 May 2023 • Hao Cheng, Meng Zhang, Liangyou Li, Qun Liu, Zhihua Zhang
Utilizing pivot language effectively can significantly improve low-resource machine translation.
no code implementations • 3 May 2023 • Hao Cheng, Meng Zhang, Weixuan Wang, Liangyou Li, Qun Liu, Zhihua Zhang
We can use automatic summarization or machine translation evaluation metrics for length-controllable machine translation, but this is not necessarily suitable and accurate.
1 code implementation • 29 Apr 2023 • Liangyu Zhang, Yang Peng, Wenhao Yang, Zhihua Zhang
To the best of our knowledge, we are the first to apply tools from semi-infinitely programming (SIP) to solve constrained reinforcement learning problems.
no code implementations • 25 Apr 2023 • Jiadong Liang, Yuze Han, Xiang Li, Zhihua Zhang
Additionally, we propose the Debiased LPSA (DLPSA) as a practical application of our jump diffusion approximation result.
no code implementations • 15 Feb 2023 • Xiang Li, Jiadong Liang, Zhihua Zhang
We study the statistical inference of nonlinear stochastic approximation algorithms utilizing a single trajectory of Markovian data.
no code implementations • 2 Feb 2023 • Wenhao Yang, Han Wang, Tadashi Kozuno, Scott M. Jordan, Zhihua Zhang
Moreover, we prove the alternative form still plays a similar role as the original form.
no code implementations • 12 Sep 2022 • Miao Lu, Wenhao Yang, Liangyu Zhang, Zhihua Zhang
Specifically, we propose a two-stage estimator based on the instrumental variables and establish its statistical properties in the confounded MDPs with a linear structure.
no code implementations • 19 May 2022 • Yizheng Hu, Zhihua Zhang
Cooperative multi-agent reinforcement learning (cMARL) has many real applications, but the policy trained by existing cMARL algorithms is not robust enough when deployed.
no code implementations • 17 May 2022 • Dachao Lin, Zhihua Zhang
In this short note, we give the convergence analysis of the policy in the recent famous policy mirror descent (PMD).
1 code implementation • 6 Apr 2022 • Hao Jin, Yang Peng, Wenhao Yang, Shusen Wang, Zhihua Zhang
We study a Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction.
no code implementations • 8 Jan 2022 • Kun Chen, Dachao Lin, Zhihua Zhang
In this paper, we follow Eftekhari's work to give a non-local convergence analysis of deep linear networks.
1 code implementation • 29 Dec 2021 • Xiang Li, Wenhao Yang, Jiadong Liang, Zhihua Zhang, Michael I. Jordan
We study Q-learning with Polyak-Ruppert averaging in a discounted Markov decision process in synchronous and tabular settings.
no code implementations • NeurIPS 2021 • Dachao Lin, Haishan Ye, Zhihua Zhang
In this paper, we follow Rodomanov and Nesterov’s work to study quasi-Newton methods.
no code implementations • NeurIPS 2021 • Dachao Lin, Ruoyu Sun, Zhihua Zhang
In this paper, we study gradient methods for training deep linear neural networks with binary cross-entropy loss.
no code implementations • 3 Sep 2021 • Xiang Li, Jiadong Liang, Xiangyu Chang, Zhihua Zhang
Both the methods are communication efficient and applicable to online data.
no code implementations • 3 Jun 2021 • Luo Luo, Guangzeng Xie, Tong Zhang, Zhihua Zhang
This paper considers stochastic first-order algorithms for convex-concave minimax problems of the form $\min_{\bf x}\max_{\bf y}f(\bf x, \bf y)$, where $f$ can be presented by the average of $n$ individual components which are $L$-average smooth.
no code implementations • Findings (ACL) 2021 • Yuekai Zhao, Li Dong, Yelong Shen, Zhihua Zhang, Furu Wei, Weizhu Chen
To this end, we propose a multi-split reversible network and combine it with DARTS.
no code implementations • 9 May 2021 • Wenhao Yang, Liangyu Zhang, Zhihua Zhang
In this paper, we study the non-asymptotic and asymptotic performances of the optimal robust policy and value function of robust Markov Decision Processes(MDPs), where the optimal robust policy and value function are solved only from a generative model.
no code implementations • 9 May 2021 • Dachao Lin, Zhihua Zhang
We consider the fundamental problem of learning linear predictors (i. e., separable datasets with zero margin) using neural networks with gradient flow or gradient descent.
no code implementations • 12 Apr 2021 • Guangzeng Xie, Hao Jin, Dachao Lin, Zhihua Zhang
We propose \textit{Meta-Regularization}, a novel approach for the adaptive choice of the learning rate in first-order gradient descent methods.
no code implementations • EACL 2021 • Yuekai Zhao, Shuchang Zhou, Zhihua Zhang
Large-scale transformers have been shown the state-of-the-art on neural machine translation.
no code implementations • 15 Mar 2021 • Guangzeng Xie, Yuze Han, Zhihua Zhang
This paper studies bilinear saddle point problems $\min_{\bf{x}} \max_{\bf{y}} g(\bf{x}) + \bf{x}^{\top} \bf{A} \bf{y} - h(\bf{y})$, where the functions $g, h$ are smooth and strongly-convex.
no code implementations • 15 Mar 2021 • Yuze Han, Guangzeng Xie, Zhihua Zhang
This construction is friendly to the analysis of PIFO algorithms.
no code implementations • 1 Mar 2021 • Xiao Guo, Xiang Li, Xiangyu Chang, Shusen Wang, Zhihua Zhang
The low communication power and the possible privacy breaches of data make the computation of eigenspace challenging.
no code implementations • 5 Jan 2021 • Xiang Li, Zhihua Zhang
In this work, we study a novel class of projection-based algorithms for linearly constrained problems (LCPs) which have a lot of applications in statistics, optimization, and machine learning.
no code implementations • 1 Jan 2021 • Jiadong Liang, Liangyu Zhang, Cheng Zhang, Zhihua Zhang
In this paper we propose a novel approach for stabilizing the training process of Generative Adversarial Networks as well as alleviating the mode collapse problem.
no code implementations • 1 Jan 2021 • Dachao Lin, Ruoyu Sun, Zhihua Zhang
Network pruning, or sparse network has a long history and practical significance in modern applications.
no code implementations • 1 Jan 2021 • Yimin Huang, YuJun Li, Zhenguo Li, Zhihua Zhang
Moreover, comparisons between different initial designs with the same model show the advantage of the proposed optimal design.
no code implementations • COLING 2020 • Chao Tian, Yifei Wang, Hao Cheng, Yijiang Lian, Zhihua Zhang
In this paper we propose a unified approach for supporting different generation manners of machine translation, including autoregressive, semi-autoregressive, and refinement-based non-autoregressive models.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Yuekai Zhao, Haoran Zhang, Shuchang Zhou, Zhihua Zhang
Active learning is an efficient approach for mitigating data dependency when training neural machine translation (NMT) models.
no code implementations • 31 Oct 2020 • Wenhao Yang, Xiang Li, Guangzeng Xie, Zhihua Zhang
Regularized MDPs serve as a smooth version of original MDPs.
no code implementations • 16 Sep 2020 • Dachao Lin, Ruoyu Sun, Zhihua Zhang
We show that linear networks can have no spurious valleys under special sparse structures, and non-linear networks could also admit no spurious valleys under a wide final layer.
no code implementations • 30 Aug 2020 • Dachao Lin, Peiqin Sun, Guangzeng Xie, Shuchang Zhou, Zhihua Zhang
Quantized Neural Networks (QNNs) use low bit-width fixed-point numbers for representing weight parameters and activations, and are often used in real-world applications due to their saving of computation resources and reproducibility of results.
no code implementations • 9 Aug 2020 • Jiadong Liang, Liangyu Zhang, Cheng Zhang, Zhihua Zhang
In this paper we propose a novel approach for stabilizing the training process of Generative Adversarial Networks as well as alleviating the mode collapse problem.
1 code implementation • 11 Jul 2020 • Yimin Huang, Yu-Jun Li, Hanrong Ye, Zhenguo Li, Zhihua Zhang
The evaluation of hyperparameters, neural architectures, or data augmentation policies becomes a critical model selection problem in advanced deep learning with a large hyperparameter search space.
1 code implementation • 19 Feb 2020 • Xiang Li, Shusen Wang, Kun Chen, Zhihua Zhang
As a practical surrogate of OPT, sign-fixing, which uses a diagonal matrix with $\pm 1$ entries as weights, has better computation complexity and stability in experiments.
no code implementations • 27 Dec 2019 • Haishan Ye, Shusen Wang, Zhihua Zhang, Tong Zhang
Fast matrix algorithms have become the fundamental tools of machine learning in big data era.
no code implementations • 21 Oct 2019 • Xiang Li, Wenhao Yang, Shusen Wang, Zhihua Zhang
Recently, the technique of local updates is a powerful tool in centralized settings to improve communication efficiency via periodical communication.
1 code implementation • 2 Oct 2019 • Bin Dong, Jikai Hou, Yiping Lu, Zhihua Zhang
Assuming that the teacher network is overparameterized, we argue that the teacher network is essentially harvesting dark knowledge from the data via early stopping.
no code implementations • 25 Sep 2019 • Guangzeng Xie, Luo Luo, Zhihua Zhang
This paper studies the lower bound complexity for the optimization problem whose objective function is the average of $n$ individual smooth convex functions.
no code implementations • 25 Sep 2019 • Bin Dong, Jikai Hou, Yiping Lu, Zhihua Zhang
Assuming that the teacher network is overparameterized, we argue that the teacher network is essentially harvesting dark knowledge from the data via early stopping.
no code implementations • 13 Sep 2019 • Luo Luo, Cheng Chen, Yu-Jun Li, Guangzeng Xie, Zhihua Zhang
We consider saddle point problems which objective functions are the average of $n$ strongly convex-concave individual components.
no code implementations • 22 Aug 2019 • Guangzeng Xie, Luo Luo, Zhihua Zhang
This paper studies the lower bound complexity for the optimization problem whose objective function is the average of $n$ individual smooth convex functions.
no code implementations • 18 Aug 2019 • Hao Jin, Dachao Lin, Zhihua Zhang
Stochastic variance-reduced gradient (SVRG) is a classical optimization method.
2 code implementations • ICLR 2020 • Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, Zhihua Zhang
In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs.
no code implementations • 28 May 2019 • Tianle Cai, Ruiqi Gao, Jikai Hou, Siyu Chen, Dong Wang, Di He, Zhihua Zhang, Li-Wei Wang
First-order methods such as stochastic gradient descent (SGD) are currently the standard algorithm for training deep neural networks.
no code implementations • ICLR 2019 • Jikai Hou, Kaixuan Huang, Zhihua Zhang
In this paper, we adopt distributionally robust optimization (DRO) (Ben-Tal et al., 2013) in hope to achieve a better generalization in deep learning tasks.
no code implementations • ICLR 2019 • Guangzeng Xie, Hao Jin, Dachao Lin, Zhihua Zhang
Specifically, we impose a regularization term on the learning rate via a generalized distance, and cast the joint updating process of the parameter and the learning rate into a maxmin problem.
no code implementations • NeurIPS 2019 • Xiang Li, Wenhao Yang, Zhihua Zhang
We propose and study a general framework for regularized Markov decision processes (MDPs) where the goal is to find an optimal policy that maximizes the expected discounted total reward plus a policy regularization term.
1 code implementation • 15 Feb 2019 • Zhiming Zhou, Jiadong Liang, Yuxuan Song, Lantao Yu, Hongwei Wang, Wei-Nan Zhang, Yong Yu, Zhihua Zhang
By contrast, Wasserstein GAN (WGAN), where the discriminative function is restricted to 1-Lipschitz, does not suffer from such a gradient uninformativeness problem.
no code implementations • 13 Feb 2019 • Xiang Li, Shusen Wang, Zhihua Zhang
Subsampled Newton methods approximate Hessian matrices through subsampling techniques, alleviating the cost of forming Hessian matrices but using sufficient curvature information.
no code implementations • 27 Sep 2018 • YuJun Li, Chengzhuo Ni, Guangzeng Xie, Wenhao Yang, Shuchang Zhou, Zhihua Zhang
A2VI is more efficient than the modified policy iteration, which is a classical approximate method for policy evaluation.
1 code implementation • 10 Aug 2018 • Zehao Dou, Zhihua Zhang
Ham achieves a state-of-the-art BLEU score of 0. 26 on Chinese poem generation task and a nearly 6. 5% averaged improvement compared with the existing machine reading comprehension models such as BIDAF and Match-LSTM.
1 code implementation • 2 Jul 2018 • Zhiming Zhou, Yuxuan Song, Lantao Yu, Hongwei Wang, Jiadong Liang, Wei-Nan Zhang, Zhihua Zhang, Yong Yu
In this paper, we investigate the underlying factor that leads to failure and success in the training of GANs.
no code implementations • 17 May 2018 • Guangzeng Xie, Yitan Wang, Shuchang Zhou, Zhihua Zhang
In this paper we explore acceleration techniques for large scale nonconvex optimization problems with special focuses on deep neural networks.
no code implementations • 17 Oct 2017 • Haishan Ye, Zhihua Zhang
Besides, the accelerated regularized sub-sampled Newton has good performance comparable to or even better than classical algorithms.
no code implementations • ICML 2017 • Haishan Ye, Luo Luo, Zhihua Zhang
We propose a unifying framework to analyze local convergence properties of second order methods.
no code implementations • 19 May 2017 • Haishan Ye, Zhihua Zhang
Besides, the accelerated regularized sub-sampled Newton has good performance comparable to or even better than state-of-art algorithms.
no code implementations • 15 May 2017 • Luo Luo, Cheng Chen, Zhihua Zhang, Wu-Jun Li, Tong Zhang
We also apply RFD to online learning and propose an effective hyperparameter-free online Newton algorithm.
no code implementations • 2 Dec 2016 • Zihao Chen, Luo Luo, Zhihua Zhang
Recently, there has been an increasing interest in designing distributed convex optimization algorithms under the setting where the data matrix is partitioned on features.
1 code implementation • 16 Aug 2016 • Shenjian Zhao, Zhihua Zhang
The encoder-decoder architecture with an attention mechanism achieves a translation performance comparable to the existing state-of-the-art phrase-based systems on the task of English-to-French translation.
no code implementations • 31 Jan 2016 • Luo Luo, Zihao Chen, Zhihua Zhang, Wu-Jun Li
It incorporates the Hessian in the smooth part of the function and exploits multistage scheme to reduce the variance of the stochastic gradient.
no code implementations • 18 Nov 2015 • Wuxuan Jiang, Cong Xie, Zhihua Zhang
We propose a new input perturbation mechanism for publishing a covariance matrix to achieve $(\epsilon, 0)$-differential privacy.
no code implementations • 9 Nov 2015 • Cong Xie, Wu-Jun Li, Zhihua Zhang
Normalized graph cut (NGC) has become a popular research topic due to its wide applications in a large variety of areas like machine learning and very large scale integration (VLSI) circuit design.
no code implementations • 29 Oct 2015 • Zhihua Zhang
Built on SVD and a theory of symmetric gauge functions, we discuss unitarily invariant norms, which are then used to formulate general results for matrix low rank approximation.
no code implementations • 29 Oct 2015 • Zhihua Zhang
In this paper we study nonconvex penalization using Bernstein functions whose first-order derivatives are completely monotone.
no code implementations • 26 Oct 2015 • Cheng Chen, Shuang Liu, Zhihua Zhang, Wu-Jun Li
To deal with these large-scale data sets, we study a distributed setting of $\mathcal{X}$-armed bandits, where $m$ players collaborate to find the maximum of the unknown function.
no code implementations • 8 Sep 2015 • Shenjian Zhao, Cong Xie, Zhihua Zhang
In many learning tasks, structural models usually lead to better interpretability and higher generalization performance.
no code implementations • 14 Apr 2015 • Shuang Liu, Cheng Chen, Zhihua Zhang
When the time horizon is unknown, we measure the frequency of communication through a new notion called the density of the communication set, and give an exact characterization of the interplay between regret and communication.
no code implementations • 29 Mar 2015 • Shusen Wang, Zhihua Zhang, Tong Zhang
The Nystr\"om method is a special instance of our fast model and is approximation to the prototype model.
no code implementations • 7 Mar 2015 • Shubao Zhang, Hui Qian, Zhihua Zhang
In this paper we focus on the $\ell_q$-analysis optimization problem for structured sparse learning ($0< q \leq 1$).
no code implementations • 26 Dec 2014 • Shusen Wang, Tong Zhang, Zhihua Zhang
Low-rank matrix completion is an important problem with extensive real-world applications.
no code implementations • NeurIPS 2014 • Cong Xie, Ling Yan, Wu-Jun Li, Zhihua Zhang
We theoretically prove that DBH can achieve lower communication cost than existing methods and can simultaneously guarantee good workload balance.
no code implementations • 3 Oct 2014 • Shuchang Zhou, Zhihua Zhang, Xiaobing Feng
In this paper we propose and study an optimization problem over a matrix group orbit that we call \emph{Group Orbit Optimization} (GOO).
no code implementations • 22 Jun 2014 • Shusen Wang, Luo Luo, Zhihua Zhang
In this paper we conduct in-depth studies of an SPSD matrix approximation model and establish strong relative-error bounds.
no code implementations • 1 Apr 2014 • Shusen Wang, Zhihua Zhang
Recently, a variant of the Nystr\"om method called the modified Nystr\"om method has demonstrated significant improvement over the standard Nystr\"om method in approximation accuracy, both theoretically and empirically.
no code implementations • 17 Dec 2013 • Zhihua Zhang
In this paper we study nonconvex penalization using Bernstein functions.
no code implementations • 17 Dec 2013 • Zhihua Zhang
We are concerned with an approximation problem for a symmetric positive semidefinite matrix due to motivation from a class of nonlinear machine learning methods.
no code implementations • 28 Aug 2013 • Zhihua Zhang, Jin Li
In this paper we discuss Bayesian nonconvex penalization for sparse learning problems.
no code implementations • 22 Jul 2013 • Zhihua Zhang, Shibo Zhao, Zebang Shen, Shuchang Zhou
In this paper we propose and study a family of sparsity-inducing penalty functions.
no code implementations • 18 Mar 2013 • Shusen Wang, Zhihua Zhang
The CUR matrix decomposition and the Nystr\"{o}m approximation are two important low-rank matrix approximation techniques.
no code implementations • NeurIPS 2012 • Shusen Wang, Zhihua Zhang
The CUR matrix decomposition is an important extension of Nyström approximation to a general matrix.
no code implementations • NeurIPS 2009 • Wu-Jun Li, Dit-yan Yeung, Zhihua Zhang
assumption is unreasonable for relational data.
no code implementations • NeurIPS 2009 • Zhihua Zhang, Guang Dai
We are often interested in casting classification and clustering problems in a regression framework, because it is feasible to achieve some statistical properties in this framework by imposing some penalty criteria.
no code implementations • NeurIPS 2008 • Zhihua Zhang, Michael. I. Jordan, Dit-yan Yeung
The duality between regularization and prior leads to interpreting regularization methods in terms of maximum a posteriori estimation and has motivated Bayesian interpretations of kernel methods.