Search Results for author: Zhihua Zhang

Found 76 papers, 7 papers with code

Lower Complexity Bounds for Finite-Sum Convex-Concave Minimax Optimization Problems

no code implementations ICML 2020 Guangzeng Xie, Luo Luo, Yijiang Lian, Zhihua Zhang

This paper studies the lower bound complexity for minimax optimization problem whose objective function is the average of $n$ individual smooth convex-concave functions.

Near Optimal Stochastic Algorithms for Finite-Sum Unbalanced Convex-Concave Minimax Optimization

no code implementations3 Jun 2021 Luo Luo, Guangzeng Xie, Tong Zhang, Zhihua Zhang

This paper considers stochastic first-order algorithms for convex-concave minimax problems of the form $\min_{\bf x}\max_{\bf y}f(\bf x, \bf y)$, where $f$ can be presented by the average of $n$ individual components which are $L$-average smooth.

Memory-Efficient Differentiable Transformer Architecture Search

no code implementations31 May 2021 Yuekai Zhao, Li Dong, Yelong Shen, Zhihua Zhang, Furu Wei, Weizhu Chen

To this end, we propose a multi-split reversible network and combine it with DARTS.

Directional Convergence Analysis under Spherically Symmetric Distribution

no code implementations9 May 2021 Dachao Lin, Zhihua Zhang

We consider the fundamental problem of learning linear predictors (i. e., separable datasets with zero margin) using neural networks with gradient flow or gradient descent.

Non-asymptotic Performances of Robust Markov Decision Processes

no code implementations9 May 2021 Wenhao Yang, Zhihua Zhang

The optimal robust policy is solved from a generative model or offline dataset without access to true transition dynamics.

Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent

no code implementations12 Apr 2021 Guangzeng Xie, Hao Jin, Dachao Lin, Zhihua Zhang

We propose \textit{Meta-Regularization}, a novel approach for the adaptive choice of the learning rate in first-order gradient descent methods.

DIPPA: An improved Method for Bilinear Saddle Point Problems

no code implementations15 Mar 2021 Guangzeng Xie, Yuze Han, Zhihua Zhang

This paper studies bilinear saddle point problems $\min_{\bf{x}} \max_{\bf{y}} g(\bf{x}) + \bf{x}^{\top} \bf{A} \bf{y} - h(\bf{y})$, where the functions $g, h$ are smooth and strongly-convex.

Privacy-Preserving Distributed SVD via Federated Power

no code implementations1 Mar 2021 Xiao Guo, Xiang Li, Xiangyu Chang, Shusen Wang, Zhihua Zhang

The low communication and computation power of such devices, and the possible privacy breaches of users' sensitive data make the computation of SVD challenging.

Federated Learning

Delayed Projection Techniques for Linearly Constrained Problems: Convergence Rates, Acceleration, and Applications

no code implementations5 Jan 2021 Xiang Li, Zhihua Zhang

In this work, we study a novel class of projection-based algorithms for linearly constrained problems (LCPs) which have a lot of applications in statistics, optimization, and machine learning.

Distributed Optimization

Optimal Designs of Gaussian Processes with Budgets for Hyperparameter Optimization

no code implementations1 Jan 2021 Yimin Huang, YuJun Li, Zhenguo Li, Zhihua Zhang

Moreover, comparisons between different initial designs with the same model show the advantage of the proposed optimal design.

Gaussian Processes Hyperparameter Optimization

Intervention Generative Adversarial Nets

no code implementations1 Jan 2021 Jiadong Liang, Liangyu Zhang, Cheng Zhang, Zhihua Zhang

In this paper we propose a novel approach for stabilizing the training process of Generative Adversarial Networks as well as alleviating the mode collapse problem.

On the Landscape of Sparse Linear Networks

no code implementations1 Jan 2021 Dachao Lin, Ruoyu Sun, Zhihua Zhang

Network pruning, or sparse network has a long history and practical significance in modern applications.

Network Pruning

Train Once, and Decode As You Like

no code implementations COLING 2020 Chao Tian, Yifei Wang, Hao Cheng, Yijiang Lian, Zhihua Zhang

In this paper we propose a unified approach for supporting different generation manners of machine translation, including autoregressive, semi-autoregressive, and refinement-based non-autoregressive models.

Machine Translation

On the Landscape of One-hidden-layer Sparse Networks and Beyond

no code implementations16 Sep 2020 Dachao Lin, Ruoyu Sun, Zhihua Zhang

We show that sparse linear networks can have spurious strict minima, which is in sharp contrast to dense linear networks which do not even have spurious minima.

Network Pruning

Optimal Quantization for Batch Normalization in Neural Network Deployments and Beyond

no code implementations30 Aug 2020 Dachao Lin, Peiqin Sun, Guangzeng Xie, Shuchang Zhou, Zhihua Zhang

Quantized Neural Networks (QNNs) use low bit-width fixed-point numbers for representing weight parameters and activations, and are often used in real-world applications due to their saving of computation resources and reproducibility of results.

Affine Transformation Quantization

Intervention Generative Adversarial Networks

no code implementations9 Aug 2020 Jiadong Liang, Liangyu Zhang, Cheng Zhang, Zhihua Zhang

In this paper we propose a novel approach for stabilizing the training process of Generative Adversarial Networks as well as alleviating the mode collapse problem.

An Asymptotically Optimal Multi-Armed Bandit Algorithm and Hyperparameter Optimization

1 code implementation11 Jul 2020 Yimin Huang, Yu-Jun Li, Hanrong Ye, Zhenguo Li, Zhihua Zhang

The evaluation of hyperparameters, neural architectures, or data augmentation policies becomes a critical model selection problem in advanced deep learning with a large hyperparameter search space.

Data Augmentation Hyperparameter Optimization +3

Communication-Efficient Distributed SVD via Local Power Iterations

no code implementations19 Feb 2020 Xiang Li, Shusen Wang, Kun Chen, Zhihua Zhang

As a practical surrogate of OPT, sign-fixing, which uses a diagonal matrix with $\pm 1$ entries as weights, has better computation complexity and stability in experiments.

Distributed Computing

Fast Generalized Matrix Regression with Applications in Machine Learning

no code implementations27 Dec 2019 Haishan Ye, Shusen Wang, Zhihua Zhang, Tong Zhang

Fast matrix algorithms have become the fundamental tools of machine learning in big data era.

Communication-Efficient Local Decentralized SGD Methods

no code implementations21 Oct 2019 Xiang Li, Wenhao Yang, Shusen Wang, Zhihua Zhang

Recently, the technique of local updates is a powerful tool in centralized settings to improve communication efficiency via periodical communication.

Distributed Computing

Distillation $\approx$ Early Stopping? Harvesting Dark Knowledge Utilizing Anisotropic Information Retrieval For Overparameterized Neural Network

1 code implementation2 Oct 2019 Bin Dong, Jikai Hou, Yiping Lu, Zhihua Zhang

Assuming that the teacher network is overparameterized, we argue that the teacher network is essentially harvesting dark knowledge from the data via early stopping.

Information Retrieval

A Stochastic Proximal Point Algorithm for Saddle-Point Problems

no code implementations13 Sep 2019 Luo Luo, Cheng Chen, Yu-Jun Li, Guangzeng Xie, Zhihua Zhang

We consider saddle point problems which objective functions are the average of $n$ strongly convex-concave individual components.

A General Analysis Framework of Lower Complexity Bounds for Finite-Sum Optimization

no code implementations22 Aug 2019 Guangzeng Xie, Luo Luo, Zhihua Zhang

This paper studies the lower bound complexity for the optimization problem whose objective function is the average of $n$ individual smooth convex functions.

Towards Better Generalization: BP-SVRG in Training Deep Neural Networks

no code implementations18 Aug 2019 Hao Jin, Dachao Lin, Zhihua Zhang

Stochastic variance-reduced gradient (SVRG) is a classical optimization method.

On the Convergence of FedAvg on Non-IID Data

1 code implementation ICLR 2020 Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, Zhihua Zhang

In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs.

Edge-computing Federated Learning

Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems

no code implementations28 May 2019 Tianle Cai, Ruiqi Gao, Jikai Hou, Siyu Chen, Dong Wang, Di He, Zhihua Zhang, Li-Wei Wang

First-order methods such as stochastic gradient descent (SGD) are currently the standard algorithm for training deep neural networks.

Distributionally Robust Optimization Leads to Better Generalization: on SGD and Beyond

no code implementations ICLR 2019 Jikai Hou, Kaixuan Huang, Zhihua Zhang

In this paper, we adopt distributionally robust optimization (DRO) (Ben-Tal et al., 2013) in hope to achieve a better generalization in deep learning tasks.

Hyper-Regularization: An Adaptive Choice for the Learning Rate in Gradient Descent

no code implementations ICLR 2019 Guangzeng Xie, Hao Jin, Dachao Lin, Zhihua Zhang

Specifically, we impose a regularization term on the learning rate via a generalized distance, and cast the joint updating process of the parameter and the learning rate into a maxmin problem.

A Regularized Approach to Sparse Optimal Policy in Reinforcement Learning

no code implementations NeurIPS 2019 Xiang Li, Wenhao Yang, Zhihua Zhang

We propose and study a general framework for regularized Markov decision processes (MDPs) where the goal is to find an optimal policy that maximizes the expected discounted total reward plus a policy regularization term.

Lipschitz Generative Adversarial Nets

1 code implementation15 Feb 2019 Zhiming Zhou, Jiadong Liang, Yuxuan Song, Lantao Yu, Hongwei Wang, Wei-Nan Zhang, Yong Yu, Zhihua Zhang

By contrast, Wasserstein GAN (WGAN), where the discriminative function is restricted to 1-Lipschitz, does not suffer from such a gradient uninformativeness problem.

Do Subsampled Newton Methods Work for High-Dimensional Data?

no code implementations13 Feb 2019 Xiang Li, Shusen Wang, Zhihua Zhang

Subsampled Newton methods approximate Hessian matrices through subsampling techniques, alleviating the cost of forming Hessian matrices but using sufficient curvature information.

Distributed Optimization

Hierarchical Attention: What Really Counts in Various NLP Tasks

1 code implementation10 Aug 2018 Zehao Dou, Zhihua Zhang

Ham achieves a state-of-the-art BLEU score of 0. 26 on Chinese poem generation task and a nearly 6. 5% averaged improvement compared with the existing machine reading comprehension models such as BIDAF and Match-LSTM.

Machine Reading Comprehension Machine Translation +2

Understanding the Effectiveness of Lipschitz-Continuity in Generative Adversarial Nets

1 code implementation2 Jul 2018 Zhiming Zhou, Yuxuan Song, Lantao Yu, Hongwei Wang, Jiadong Liang, Wei-Nan Zhang, Zhihua Zhang, Yong Yu

In this paper, we investigate the underlying factor that leads to failure and success in the training of GANs.

Interpolatron: Interpolation or Extrapolation Schemes to Accelerate Optimization for Deep Neural Networks

no code implementations17 May 2018 Guangzeng Xie, Yitan Wang, Shuchang Zhou, Zhihua Zhang

In this paper we explore acceleration techniques for large scale nonconvex optimization problems with special focuses on deep neural networks.

Nesterov's Acceleration For Approximate Newton

no code implementations17 Oct 2017 Haishan Ye, Zhihua Zhang

Besides, the accelerated regularized sub-sampled Newton has good performance comparable to or even better than classical algorithms.

Approximate Newton Methods and Their Local Convergence

no code implementations ICML 2017 Haishan Ye, Luo Luo, Zhihua Zhang

We propose a unifying framework to analyze local convergence properties of second order methods.

Nestrov's Acceleration For Second Order Method

no code implementations19 May 2017 Haishan Ye, Zhihua Zhang

Besides, the accelerated regularized sub-sampled Newton has good performance comparable to or even better than state-of-art algorithms.

Robust Frequent Directions with Application in Online Learning

no code implementations15 May 2017 Luo Luo, Cheng Chen, Zhihua Zhang, Wu-Jun Li, Tong Zhang

We also apply RFD to online learning and propose an effective hyperparameter-free online Newton algorithm.

Communication Lower Bounds for Distributed Convex Optimization: Partition Data on Features

no code implementations2 Dec 2016 Zihao Chen, Luo Luo, Zhihua Zhang

Recently, there has been an increasing interest in designing distributed convex optimization algorithms under the setting where the data matrix is partitioned on features.

An Efficient Character-Level Neural Machine Translation

1 code implementation16 Aug 2016 Shenjian Zhao, Zhihua Zhang

The encoder-decoder architecture with an attention mechanism achieves a translation performance comparable to the existing state-of-the-art phrase-based systems on the task of English-to-French translation.

Machine Translation

A Proximal Stochastic Quasi-Newton Algorithm

no code implementations31 Jan 2016 Luo Luo, Zihao Chen, Zhihua Zhang, Wu-Jun Li

It incorporates the Hessian in the smooth part of the function and exploits multistage scheme to reduce the variance of the stochastic gradient.

Wishart Mechanism for Differentially Private Principal Components Analysis

no code implementations18 Nov 2015 Wuxuan Jiang, Cong Xie, Zhihua Zhang

We propose a new input perturbation mechanism for publishing a covariance matrix to achieve $(\epsilon, 0)$-differential privacy.

A New Relaxation Approach to Normalized Hypergraph Cut

no code implementations9 Nov 2015 Cong Xie, Wu-Jun Li, Zhihua Zhang

Normalized graph cut (NGC) has become a popular research topic due to its wide applications in a large variety of areas like machine learning and very large scale integration (VLSI) circuit design.

The Singular Value Decomposition, Applications and Beyond

no code implementations29 Oct 2015 Zhihua Zhang

Built on SVD and a theory of symmetric gauge functions, we discuss unitarily invariant norms, which are then used to formulate general results for matrix low rank approximation.

Matrix Completion

Nonconvex Penalization in Sparse Estimation: An Approach Based on the Bernstein Function

no code implementations29 Oct 2015 Zhihua Zhang

In this paper we study nonconvex penalization using Bernstein functions whose first-order derivatives are completely monotone.

General Classification

A Parallel algorithm for $\mathcal{X}$-Armed bandits

no code implementations26 Oct 2015 Cheng Chen, Shuang Liu, Zhihua Zhang, Wu-Jun Li

To deal with these large-scale data sets, we study a distributed setting of $\mathcal{X}$-armed bandits, where $m$ players collaborate to find the maximum of the unknown function.

A Scalable and Extensible Framework for Superposition-Structured Models

no code implementations8 Sep 2015 Shenjian Zhao, Cong Xie, Zhihua Zhang

In many learning tasks, structural models usually lead to better interpretability and higher generalization performance.

Regret vs. Communication: Distributed Stochastic Multi-Armed Bandits and Beyond

no code implementations14 Apr 2015 Shuang Liu, Cheng Chen, Zhihua Zhang

When the time horizon is unknown, we measure the frequency of communication through a new notion called the density of the communication set, and give an exact characterization of the interplay between regret and communication.

Multi-Armed Bandits

Towards More Efficient SPSD Matrix Approximation and CUR Matrix Decomposition

no code implementations29 Mar 2015 Shusen Wang, Zhihua Zhang, Tong Zhang

The Nystr\"om method is a special instance of our fast model and is approximation to the prototype model.

A Nonconvex Approach for Structured Sparse Learning

no code implementations7 Mar 2015 Shubao Zhang, Hui Qian, Zhihua Zhang

In this paper we focus on the $\ell_q$-analysis optimization problem for structured sparse learning ($0< q \leq 1$).

Sparse Learning

Distributed Power-law Graph Computing: Theoretical and Empirical Analysis

no code implementations NeurIPS 2014 Cong Xie, Ling Yan, Wu-Jun Li, Zhihua Zhang

We theoretically prove that DBH can achieve lower communication cost than existing methods and can simultaneously guarantee good workload balance.

graph partitioning

Group Orbit Optimization: A Unified Approach to Data Normalization

no code implementations3 Oct 2014 Shuchang Zhou, Zhihua Zhang, Xiaobing Feng

In this paper we propose and study an optimization problem over a matrix group orbit that we call \emph{Group Orbit Optimization} (GOO).

Tensor Decomposition

SPSD Matrix Approximation vis Column Selection: Theories, Algorithms, and Extensions

no code implementations22 Jun 2014 Shusen Wang, Luo Luo, Zhihua Zhang

In this paper we conduct in-depth studies of an SPSD matrix approximation model and establish strong relative-error bounds.

Efficient Algorithms and Error Analysis for the Modified Nystrom Method

no code implementations1 Apr 2014 Shusen Wang, Zhihua Zhang

Recently, a variant of the Nystr\"om method called the modified Nystr\"om method has demonstrated significant improvement over the standard Nystr\"om method in approximation accuracy, both theoretically and empirically.

The Bernstein Function: A Unifying Framework of Nonconvex Penalization in Sparse Estimation

no code implementations17 Dec 2013 Zhihua Zhang

In this paper we study nonconvex penalization using Bernstein functions.

The Matrix Ridge Approximation: Algorithms and Applications

no code implementations17 Dec 2013 Zhihua Zhang

We are concerned with an approximation problem for a symmetric positive semidefinite matrix due to motivation from a class of nonlinear machine learning methods.

Compound Poisson Processes, Latent Shrinkage Priors and Bayesian Nonconvex Penalization

no code implementations28 Aug 2013 Zhihua Zhang, Jin Li

In this paper we discuss Bayesian nonconvex penalization for sparse learning problems.

Sparse Learning

Kinetic Energy Plus Penalty Functions for Sparse Estimation

no code implementations22 Jul 2013 Zhihua Zhang, Shibo Zhao, Zebang Shen, Shuchang Zhou

In this paper we propose and study a family of sparsity-inducing penalty functions.

Improving CUR Matrix Decomposition and the Nyström Approximation via Adaptive Sampling

no code implementations18 Mar 2013 Shusen Wang, Zhihua Zhang

The CUR matrix decomposition and the Nystr\"{o}m approximation are two important low-rank matrix approximation techniques.

A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound

no code implementations NeurIPS 2012 Shusen Wang, Zhihua Zhang

The CUR matrix decomposition is an important extension of Nyström approximation to a general matrix.

Optimal Scoring for Unsupervised Learning

no code implementations NeurIPS 2009 Zhihua Zhang, Guang Dai

We are often interested in casting classification and clustering problems in a regression framework, because it is feasible to achieve some statistical properties in this framework by imposing some penalty criteria.

General Classification

Posterior Consistency of the Silverman g-prior in Bayesian Model Choice

no code implementations NeurIPS 2008 Zhihua Zhang, Michael. I. Jordan, Dit-yan Yeung

The duality between regularization and prior leads to interpreting regularization methods in terms of maximum a posteriori estimation and has motivated Bayesian interpretations of kernel methods.

Cannot find the paper you are looking for? You can Submit a new open access paper.