Search Results for author: Tuo Zhao

Found 98 papers, 18 papers with code

Deep Reinforcement Learning with Smooth Policy

no code implementations ICML 2020 Qianli Shen, Yan Li, Haoming Jiang, Zhaoran Wang, Tuo Zhao

In contrast to policy parameterized by linear/reproducing kernel functions, where simple regularization techniques suffice to control smoothness, for neural network based reinforcement learning algorithms, there is no readily available solution to learn a smooth policy.

reinforcement-learning

Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint

no code implementations9 Jun 2022 Hao liu, Minshuo Chen, Siawpeng Er, Wenjing Liao, Tong Zhang, Tuo Zhao

Overparameterized neural networks enjoy great representation power on complex data, and more importantly yield sufficiently smooth output, which is crucial to their generalization and robustness.

Image Classification

Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks

no code implementations6 Jun 2022 Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao

We consider the off-policy evaluation problem of reinforcement learning using deep neural networks.

A Manifold Two-Sample Test Study: Integral Probability Metric with Neural Networks

no code implementations4 May 2022 Jie Wang, Minshuo Chen, Tuo Zhao, Wenjing Liao, Yao Xie

Based on the approximation theory of neural networks, we show that the neural network IPM test has the type-II risk in the order of $n^{-(s+\beta)/d}$, which is in the same order of the type-II risk as the H\"older IPM test.

CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing

1 code implementation ACL 2022 Chen Liang, Pengcheng He, Yelong Shen, Weizhu Chen, Tuo Zhao

To retain ensemble benefits while maintaining a low memory cost, we propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO.

Ensemble Learning

Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably

no code implementations7 Feb 2022 Tianyi Liu, Yan Li, Enlu Zhou, Tuo Zhao

We investigate the role of noise in optimization algorithms for learning over-parameterized models.

Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity

no code implementations24 Jan 2022 Yan Li, Tuo Zhao, Guanghui Lan

The superlinear convergence takes effect after no more than $\mathcal{O}(\log(1/\Delta^*))$ number of iterations, where $\Delta^*$ is defined via a gap quantity associated with the optimal state-action value function; (2) HPMD also exhibits last-iterate convergence of the policy, with the limiting policy corresponding exactly to the optimal policy with the maximal entropy for every state.

Policy Gradient Methods

Block Policy Mirror Descent

no code implementations15 Jan 2022 Guanghui Lan, Yan Li, Tuo Zhao

Compared to the traditional PG methods with a batch update rule, which visits and updates the policy for every state, BPMD methods have cheap per-iteration computation via a partial update rule that performs the policy update on a sampled state.

reinforcement-learning

Deep Learning Assisted End-to-End Synthesis of mm-Wave Passive Networks with 3D EM Structures: A Study on A Transformer-Based Matching Network

no code implementations6 Jan 2022 Siawpeng Er, Edward Liu, Minshuo Chen, Yan Li, Yuqi Liu, Tuo Zhao, Hua Wang

This paper presents a deep learning assisted synthesis approach for direct end-to-end generation of RF/mm-wave passive matching network with 3D EM structures.

Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces

no code implementations1 Jan 2022 Hao liu, Haizhao Yang, Minshuo Chen, Tuo Zhao, Wenjing Liao

Learning operators between infinitely dimensional spaces is an important learning task arising in wide applications in machine learning, imaging science, mathematical modeling and simulations, etc.

Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL

1 code implementation NeurIPS 2021 Minshuo Chen, Yan Li, Ethan Wang, Zhuoran Yang, Zhaoran Wang, Tuo Zhao

Theoretically, under a weak coverage assumption that the experience dataset contains enough information about the optimal policy, we prove that for an episodic mean-field MDP with a horizon $H$ and $N$ training trajectories, SAFARI attains a sub-optimality gap of $\mathcal{O}(H^2d_{\rm eff} /\sqrt{N})$, where $d_{\rm eff}$ is the effective dimension of the function class for parameterizing the value function, but independent on the number of agents.

Multi-agent Reinforcement Learning

Frequency-aware SGD for Efficient Embedding Learning with Provable Benefits

no code implementations ICLR 2022 Yan Li, Dhruv Choudhary, Xiaohan Wei, Baichuan Yuan, Bhargav Bhushanam, Tuo Zhao, Guanghui Lan

We show that incorporating frequency information of tokens in the embedding learning problems leads to provably efficient algorithms, and demonstrate that common adaptive algorithms implicitly exploit the frequency information to a large extent.

Language Modelling Recommendation Systems

Taming Sparsely Activated Transformer with Stochastic Experts

1 code implementation ICLR 2022 Simiao Zuo, Xiaodong Liu, Jian Jiao, Young Jin Kim, Hany Hassan, Ruofei Zhang, Tuo Zhao, Jianfeng Gao

While most on-going research focuses on improving SAMs models by exploring methods of routing inputs to experts, our analysis reveals that such research might not lead to the solution we expect, i. e., the commonly-used routing methods based on gating mechanisms do not work better than randomly routing inputs to experts.

Machine Translation Translation

Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect

no code implementations ICLR 2022 Yuqing Wang, Minshuo Chen, Tuo Zhao, Molei Tao

Moreover, we rigorously establish an implicit bias of GD induced by such a large learning rate, termed 'balancing', meaning that magnitudes of $X$ and $Y$ at the limit of GD iterations will be close even if their initialization is significantly unbalanced.

A Principled Permutation Invariant Approach to Mean-Field Multi-Agent Reinforcement Learning

no code implementations29 Sep 2021 Yan Li, Lingxiao Wang, Jiachen Yang, Ethan Wang, Zhaoran Wang, Tuo Zhao, Hongyuan Zha

To exploit the permutation invariance therein, we propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation- invariant actor-critic neural architecture.

Inductive Bias Multi-agent Reinforcement Learning +1

Adversarially Regularized Policy Learning Guided by Trajectory Optimization

no code implementations16 Sep 2021 Zhigen Zhao, Simiao Zuo, Tuo Zhao, Ye Zhao

Recent advancement in combining trajectory optimization with function approximation (especially neural networks) shows promise in learning complex control policies for diverse tasks in robot systems.

Self-Training with Differentiable Teacher

no code implementations15 Sep 2021 Simiao Zuo, Yue Yu, Chen Liang, Haoming Jiang, Siawpeng Er, Chao Zhang, Tuo Zhao, Hongyuan Zha

In self-training, the student contributes to the prediction performance, and the teacher controls the training process by generating pseudo-labels.

named-entity-recognition Named Entity Recognition +1

Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks

no code implementations7 Sep 2021 Hao liu, Minshuo Chen, Tuo Zhao, Wenjing Liao

Most of existing statistical theories on deep neural networks have sample complexities cursed by the data dimension and therefore cannot well explain the empirical success of deep learning on high-dimensional data.

QUEACO: Borrowing Treasures from Weakly-labeled Behavior Data for Query Attribute Value Extraction

no code implementations19 Aug 2021 Danqing Zhang, Zheng Li, Tianyu Cao, Chen Luo, Tony Wu, Hanqing Lu, Yiwei Song, Bing Yin, Tuo Zhao, Qiang Yang

We study the problem of query attribute value extraction, which aims to identify named entities from user queries as diverse surface form attribute values and afterward transform them into formally canonical forms.

Attribute Value Extraction named-entity-recognition +1

Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data

no code implementations15 Aug 2021 Yan Li, Caleb Ju, Ethan X. Fang, Tuo Zhao

We show that BPPA attains non-trivial margin, which closely depends on the condition number of the distance generating function inducing the Bregman divergence.

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data

1 code implementation ACL 2021 Haoming Jiang, Danqing Zhang, Tianyu Cao, Bing Yin, Tuo Zhao

Unfortunately, we observe that weakly labeled data does not necessarily improve, or even deteriorate the model performance (due to the extensive noise in the weak labels) when we train deep NER models over a simple or weighted combination of the strongly labeled and weakly labeled data.

named-entity-recognition Natural Language Processing +1

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization

1 code implementation ACL 2021 Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu, Pengcheng He, Tuo Zhao, Weizhu Chen

The Lottery Ticket Hypothesis suggests that an over-parametrized network consists of ``lottery tickets'', and training a certain collection of them (i. e., a subnetwork) can match the performance of the full model.

Model Compression Multi-Task Learning

Permutation Invariant Policy Optimization for Mean-Field Multi-Agent Reinforcement Learning: A Principled Approach

no code implementations18 May 2021 Yan Li, Lingxiao Wang, Jiachen Yang, Ethan Wang, Zhaoran Wang, Tuo Zhao, Hongyuan Zha

To exploit the permutation invariance therein, we propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.

Inductive Bias Multi-agent Reinforcement Learning +1

COUnty aggRegation mixup AuGmEntation (COURAGE) COVID-19 Prediction

no code implementations3 May 2021 Siawpeng Er, Shihao Yang, Tuo Zhao

The global spread of COVID-19, the disease caused by the novel coronavirus SARS-CoV-2, has cast a significant threat to mankind.

Natural Language Processing Time Series

Token-wise Curriculum Learning for Neural Machine Translation

no code implementations Findings (EMNLP) 2021 Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Tuo Zhao

Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage.

Machine Translation Translation

Reinforcement Learning for Adaptive Mesh Refinement

no code implementations1 Mar 2021 Jiachen Yang, Tarik Dzanic, Brenden Petersen, Jun Kudo, Ketan Mittal, Vladimir Tomov, Jean-Sylvain Camier, Tuo Zhao, Hongyuan Zha, Tzanio Kolev, Robert Anderson, Daniel Faissol

Large-scale finite element simulations of complex physical systems governed by partial differential equations (PDE) crucially depend on adaptive mesh refinement (AMR) to allocate computational budget to regions where higher resolution is required.

Inductive Bias reinforcement-learning

Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

no code implementations24 Feb 2021 Tianyi Liu, Yan Li, Song Wei, Enlu Zhou, Tuo Zhao

Numerous empirical evidences have corroborated the importance of noise in nonconvex optimization problems.

Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach

1 code implementation EMNLP 2021 Haoming Jiang, Bo Dai, Mengjiao Yang, Tuo Zhao, Wei Wei

An ideal environment for evaluating dialog systems, also known as the Turing test, needs to involve human interaction, which is usually not affordable for large-scale experiments.

Model-based Reinforcement Learning reinforcement-learning +1

Differentiable Top-k with Optimal Transport

no code implementations NeurIPS 2020 Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister

Finding the k largest or smallest elements from a collection of scores, i. e., top-k operation, is an important model component widely used in information retrieval, machine learning, and data mining.

Information Retrieval

Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? --- A Neural Tangent Kernel Perspective

no code implementations NeurIPS 2020 Kaixuan Huang, Yuqing Wang, Molei Tao, Tuo Zhao

We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity.

A Hypergradient Approach to Robust Regression without Correspondence

no code implementations ICLR 2021 Yujia Xie, Yixiu Mao, Simiao Zuo, Hongteng Xu, Xiaojing Ye, Tuo Zhao, Hongyuan Zha

Due to the combinatorial nature of the problem, most existing methods are only applicable when the sample size is small, and limited to linear regression models.

Multi-Object Tracking

Doubly Robust Off-Policy Learning on Low-Dimensional Manifolds by Deep Neural Networks

no code implementations3 Nov 2020 Minshuo Chen, Hao liu, Wenjing Liao, Tuo Zhao

Our theory shows that deep neural networks are adaptive to the low-dimensional geometric structures of the covariates, and partially explains the success of deep learning for causal inference.

Causal Inference

Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data

1 code implementation EMNLP 2020 Lingkai Kong, Haoming Jiang, Yuchen Zhuang, Jie Lyu, Tuo Zhao, Chao Zhang

Fine-tuned pre-trained language models can suffer from severe miscalibration for both in-distribution and out-of-distribution (OOD) data due to over-parameterization.

Language Modelling OOD Detection +1

Differentiable Top-$k$ with Optimal Transport

no code implementations NeurIPS Workshop LMCA 2020 Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister

The top-$k$ operation, i. e., finding the $k$ largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining.

Information Retrieval

How Important is the Train-Validation Split in Meta-Learning?

no code implementations12 Oct 2020 Yu Bai, Minshuo Chen, Pan Zhou, Tuo Zhao, Jason D. Lee, Sham Kakade, Huan Wang, Caiming Xiong

A common practice in meta-learning is to perform a train-validation split (\emph{train-val method}) where the prior adapts to the task on one split of the data, and the resulting predictor is evaluated on another split.

Meta-Learning

Residual Network Based Direct Synthesis of EM Structures: A Study on One-to-One Transformers

no code implementations25 Aug 2020 David Munzer, Siawpeng Er, Minshuo Chen, Yan Li, Naga S. Mannem, Tuo Zhao, Hua Wang

We propose using machine learning models for the direct synthesis of on-chip electromagnetic (EM) passive structures to enable rapid or even automated designs and optimizations of RF/mm-Wave circuits.

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

1 code implementation27 Jun 2020 Jason Ge, Xingguo Li, Haoming Jiang, Han Liu, Tong Zhang, Mengdi Wang, Tuo Zhao

We describe a new library named picasso, which implements a unified framework of pathwise coordinate optimization for a variety of sparse learning problems (e. g., sparse linear regression, sparse logistic regression, sparse Poisson regression and scaled sparse linear regression) combined with efficient active set selection strategies.

Sparse Learning

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R

no code implementations27 Jun 2020 Xingguo Li, Tuo Zhao, Xiaoming Yuan, Han Liu

This paper describes an R package named flare, which implements a family of new high dimensional regression methods (LAD Lasso, SQRT Lasso, $\ell_q$ Lasso, and Dantzig selector) and their extensions to sparse precision matrix estimation (TIGER and CLIME).

The huge Package for High-dimensional Undirected Graph Estimation in R

no code implementations26 Jun 2020 Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty, Larry Wasserman

We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data.

Model Selection

Towards Understanding Hierarchical Learning: Benefits of Neural Representations

no code implementations NeurIPS 2020 Minshuo Chen, Yu Bai, Jason D. Lee, Tuo Zhao, Huan Wang, Caiming Xiong, Richard Socher

When the trainable network is the quadratic Taylor model of a wide two-layer network, we show that neural representation can achieve improved sample complexities compared with the raw input: For learning a low-rank degree-$p$ polynomial ($p \geq 4$) in $d$ dimension, neural representation requires only $\tilde{O}(d^{\lceil p/2 \rceil})$ samples, while the best-known sample complexity upper bound for the raw input is $\tilde{O}(d^{p-1})$.

Implicit Bias of Gradient Descent based Adversarial Training on Separable Data

no code implementations ICLR 2020 Yan Li, Ethan X. Fang, Huan Xu, Tuo Zhao

Specifically, we show that for any fixed iteration $T$, when the adversarial perturbation during training has proper bounded L2 norm, the classifier learned by gradient descent based adversarial training converges in direction to the maximum L2 norm margin classifier at the rate of $O(1/\sqrt{T})$, significantly faster than the rate $O(1/\log T}$ of training with clean data.

Deep Reinforcement Learning with Robust and Smooth Policy

no code implementations21 Mar 2020 Qianli Shen, Yan Li, Haoming Jiang, Zhaoran Wang, Tuo Zhao

Deep reinforcement learning (RL) has achieved great empirical successes in various domains.

reinforcement-learning

Transformer Hawkes Process

2 code implementations ICML 2020 Simiao Zuo, Haoming Jiang, Zichong Li, Tuo Zhao, Hongyuan Zha

Modern data acquisition routinely produce massive amounts of event sequence data in various domains, such as social media, healthcare, and financial markets.

Point Processes

Differentiable Top-k Operator with Optimal Transport

no code implementations16 Feb 2020 Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister

The top-k operation, i. e., finding the k largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining.

Information Retrieval

Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? -- A Neural Tangent Kernel Perspective

no code implementations14 Feb 2020 Kaixuan Huang, Yuqing Wang, Molei Tao, Tuo Zhao

We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity.

Statistical Guarantees of Generative Adversarial Networks for Distribution Estimation

no code implementations10 Feb 2020 Minshuo Chen, Wenjing Liao, Hongyuan Zha, Tuo Zhao

This paper provides statistical guarantees of GANs for the estimation of data distributions which have densities in a H\"{o}lder space.

Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds

no code implementations NeurIPS 2019 Minshuo Chen, Haoming Jiang, Wenjing Liao, Tuo Zhao

The network size scales exponentially in the approximation error, with an exponent depending on the intrinsic dimension of the data and the smoothness of the function.

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

5 code implementations ACL 2020 Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Tuo Zhao

However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model.

Linguistic Acceptability Natural Language Inference +4

Multi-Domain Neural Machine Translation with Word-Level Adaptive Layer-wise Domain Mixing

1 code implementation ACL 2020 Haoming Jiang, Chen Liang, Chong Wang, Tuo Zhao

To overcome this limitation, we propose a novel multi-domain NMT model using individual modules for each domain, on which we apply word-level, adaptive and layer-wise domain mixing.

Machine Translation Transfer Learning +1

On Generalization Bounds of a Family of Recurrent Neural Networks

no code implementations ICLR 2019 Minshuo Chen, Xingguo Li, Tuo Zhao

We remark: (1) Our generalization bound for vanilla RNNs is significantly tighter than the best of existing results; (2) We are not aware of any other generalization bounds for MGU, LSTM, and Conv RNNs in the exiting literature; (3) We demonstrate the advantages of these variants in generalization.

Generalization Bounds PAC learning

Towards Understanding the Importance of Shortcut Connections in Residual Networks

no code implementations NeurIPS 2019 Tianyi Liu, Minshuo Chen, Mo Zhou, Simon S. Du, Enlu Zhou, Tuo Zhao

We show, however, that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialized arbitrarily in a ball.

Towards Understanding the Importance of Noise in Training Neural Networks

no code implementations7 Sep 2019 Mo Zhou, Tianyi Liu, Yan Li, Dachao Lin, Enlu Zhou, Tuo Zhao

Numerous empirical evidence has corroborated that the noise plays a crucial rule in effective and efficient training of neural networks.

Meta Learning with Relational Information for Short Sequences

1 code implementation NeurIPS 2019 Yujia Xie, Haoming Jiang, Feng Liu, Tuo Zhao, Hongyuan Zha

This paper proposes a new meta-learning method -- named HARMLESS (HAwkes Relational Meta LEarning method for Short Sequences) for learning heterogeneous point process models from short event sequence data along with a relational network.

Meta-Learning

Nonparametric Regression on Low-Dimensional Manifolds using Deep ReLU Networks : Function Approximation and Statistical Recovery

no code implementations NeurIPS 2019 Minshuo Chen, Haoming Jiang, Wenjing Liao, Tuo Zhao

It therefore demonstrates the adaptivity of deep ReLU networks to low-dimensional geometric structures of data, and partially explains the power of deep ReLU networks in tackling high-dimensional data with low-dimensional geometric structures.

Inductive Bias of Gradient Descent based Adversarial Training on Separable Data

no code implementations7 Jun 2019 Yan Li, Ethan X. Fang, Huan Xu, Tuo Zhao

Specifically, we show that when the adversarial perturbation during training has bounded $\ell_2$-norm, the classifier learned by gradient descent based adversarial training converges in direction to the maximum $\ell_2$-norm margin classifier at the rate of $\tilde{\mathcal{O}}(1/\sqrt{T})$, significantly faster than the rate $\mathcal{O}(1/\log T)$ of training with clean data.

Inductive Bias

On Scalable and Efficient Computation of Large Scale Optimal Transport

no code implementations ICLR Workshop DeepGenStruct 2019 Yujia Xie, Minshuo Chen, Haoming Jiang, Tuo Zhao, Hongyuan Zha

Optimal Transport (OT) naturally arises in many machine learning applications, yet the heavy computational burden limits its wide-spread uses.

Domain Adaptation

On Tighter Generalization Bounds for Deep Neural Networks: CNNs, ResNets, and Beyond

no code implementations ICLR 2019 Xingguo Li, Junwei Lu, Zhaoran Wang, Jarvis Haupt, Tuo Zhao

We propose a generalization error bound for a general family of deep neural networks based on the depth and width of the networks, as well as the spectral norm of weight matrices.

Generalization Bounds

Learning to Defense by Learning to Attack

no code implementations ICLR Workshop DeepGenStruct 2019 Zhehui Chen, Haoming Jiang, Yuyang Shi, Bo Dai, Tuo Zhao

From the perspective of generative learning, our proposed method can be viewed as learning a deep generative model for generating adversarial samples, which is adaptive to the robust classification.

Adversarial Attack Robust classification

On Computation and Generalization of GANs with Spectrum Control

no code implementations28 Dec 2018 Haoming Jiang, Zhehui Chen, Minshuo Chen, Feng Liu, Dingding Wang, Tuo Zhao

Specifically, we propose a new reparameterization approach for the weight matrices of the discriminator in GANs, which allows us to directly manipulate the spectra of the weight matrices through various regularizers and constraints, without intensively computing singular value decompositions.

Learning to Defend by Learning to Attack

no code implementations3 Nov 2018 Haoming Jiang, Zhehui Chen, Yuyang Shi, Bo Dai, Tuo Zhao

Adversarial training provides a principled approach for training robust neural networks.

Adversarial Attack Adversarial Defense +2

Provable Gaussian Embedding with One Observation

no code implementations NeurIPS 2018 Ming Yu, Zhuoran Yang, Tuo Zhao, Mladen Kolar, Zhaoran Wang

In this paper, we study the Gaussian embedding model and develop the first theoretical results for exponential family embedding models.

On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond

no code implementations13 Jun 2018 Xingguo Li, Junwei Lu, Zhaoran Wang, Jarvis Haupt, Tuo Zhao

We establish a margin based data dependent generalization error bound for a general family of deep neural networks in terms of the depth and width, as well as the Jacobian of the networks.

Generalization Bounds

On Landscape of Lagrangian Functions and Stochastic Search for Constrained Nonconvex Optimization

no code implementations13 Jun 2018 Zhehui Chen, Xingguo Li, Lin F. Yang, Jarvis Haupt, Tuo Zhao

However, due to the lack of convexity, their landscape is not well understood and how to find the stable equilibria of the Lagrangian function is still unknown.

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

no code implementations NeurIPS 2018 Tianyi Liu, Shiyang Li, Jianping Shi, Enlu Zhou, Tuo Zhao

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) is one of the most popular algorithms in distributed machine learning.

Stochastic Optimization

Detecting Nonlinear Causality in Multivariate Time Series with Sparse Additive Models

no code implementations11 Mar 2018 Yingxiang Yang, Adams Wei Yu, Zhaoran Wang, Tuo Zhao

We propose a nonparametric method for detecting nonlinear causal relationship within a set of multidimensional discrete time series, by using sparse additive models (SpAMs).

Additive models Model Selection +1

Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization

no code implementations NeurIPS 2018 Minshuo Chen, Lin Yang, Mengdi Wang, Tuo Zhao

Specifically, our goal is to estimate the principle component of time series data with respect to the covariance matrix of the stationary distribution.

Dimensionality Reduction Stochastic Optimization +1

A Diffusion Approximation Theory of Momentum SGD in Nonconvex Optimization

no code implementations14 Feb 2018 Tianyi Liu, Zhehui Chen, Enlu Zhou, Tuo Zhao

Our theoretical discovery partially corroborates the empirical success of MSGD in training deep neural networks.

Bayesian Inference Dimensionality Reduction +1

Misspecified Nonconvex Statistical Optimization for Phase Retrieval

no code implementations18 Dec 2017 Zhuoran Yang, Lin F. Yang, Ethan X. Fang, Tuo Zhao, Zhaoran Wang, Matey Neykov

Existing nonconvex statistical optimization theory and methods crucially rely on the correct specification of the underlying "true" statistical models.

Parametric Simplex Method for Sparse Learning

no code implementations NeurIPS 2017 Haotian Pang, Han Liu, Robert J. Vanderbei, Tuo Zhao

High dimensional sparse learning has imposed a great computational challenge to large scale data analysis.

Sparse Learning

Deep Hyperspherical Learning

no code implementations NeurIPS 2017 Weiyang Liu, Yan-Ming Zhang, Xingguo Li, Zhiding Yu, Bo Dai, Tuo Zhao, Le Song

In light of such challenges, we propose hyperspherical convolution (SphereConv), a novel learning framework that gives angular representations on hyperspheres.

Representation Learning

On Quadratic Convergence of DC Proximal Newton Algorithm for Nonconvex Sparse Learning in High Dimensions

no code implementations19 Jun 2017 Xingguo Li, Lin F. Yang, Jason Ge, Jarvis Haupt, Tong Zhang, Tuo Zhao

We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions.

Sparse Learning

Online Factorization and Partition of Complex Networks From Random Walks

no code implementations22 May 2017 Lin F. Yang, Vladimir Braverman, Tuo Zhao, Mengdi Wang

We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm.

Homotopy Parametric Simplex Method for Sparse Learning

no code implementations4 Apr 2017 Haotian Pang, Robert Vanderbei, Han Liu, Tuo Zhao

High dimensional sparse learning has imposed a great computational challenge to large scale data analysis.

Sparse Learning

Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization

no code implementations29 Dec 2016 Xingguo Li, Junwei Lu, Raman Arora, Jarvis Haupt, Han Liu, Zhaoran Wang, Tuo Zhao

We propose a general theory for studying the \xl{landscape} of nonconvex \xl{optimization} with underlying symmetric structures \tz{for a class of machine learning problems (e. g., low-rank matrix factorization, phase retrieval, and deep linear neural networks)}.

The Physical Systems Behind Optimization Algorithms

no code implementations NeurIPS 2018 Lin F. Yang, R. Arora, V. Braverman, Tuo Zhao

We use differential equations based approaches to provide some {\it \textbf{physics}} insights into analyzing the dynamics of popular optimization algorithms in machine learning.

On Faster Convergence of Cyclic Block Coordinate Descent-type Methods for Strongly Convex Minimization

no code implementations10 Jul 2016 Xingguo Li, Tuo Zhao, Raman Arora, Han Liu, Mingyi Hong

In particular, we first show that for a family of quadratic minimization problems, the iteration complexity $\mathcal{O}(\log^2(p)\cdot\log(1/\epsilon))$ of the CBCD-type methods matches that of the GD methods in term of dependency on $p$, up to a $\log^2 p$ factor.

On Fast Convergence of Proximal Algorithms for SQRT-Lasso Optimization: Don't Worry About Its Nonsmooth Loss Function

no code implementations25 May 2016 Xingguo Li, Haoming Jiang, Jarvis Haupt, Raman Arora, Han Liu, Mingyi Hong, Tuo Zhao

Many machine learning techniques sacrifice convenient computational structures to gain estimation robustness and modeling flexibility.

NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization

no code implementations NeurIPS 2016 Davood Hajinezhad, Mingyi Hong, Tuo Zhao, Zhaoran Wang

We study a stochastic and distributed algorithm for nonconvex problems whose objective consists of a sum of $N$ nonconvex $L_i/N$-smooth functions, plus a nonsmooth regularizer.

Stochastic Optimization

Nonconvex Sparse Learning via Stochastic Optimization with Progressive Variance Reduction

no code implementations9 May 2016 Xingguo Li, Raman Arora, Han Liu, Jarvis Haupt, Tuo Zhao

We propose a stochastic variance reduced optimization algorithm for solving sparse learning problems with cardinality constraints.

Sparse Learning Stochastic Optimization

Pathwise Coordinate Optimization for Sparse Learning: Algorithm and Theory

no code implementations23 Dec 2014 Tuo Zhao, Han Liu, Tong Zhang

This is the first result on the computational and statistical guarantees of the pathwise coordinate optimization framework in high dimensions.

Sparse Learning

Accelerated Mini-batch Randomized Block Coordinate Descent Method

no code implementations NeurIPS 2014 Tuo Zhao, Mo Yu, Yiming Wang, Raman Arora, Han Liu

When the regularization function is block separable, we can solve the minimization problems in a randomized block coordinate descent (RBCD) manner.

Sparse Learning Stochastic Optimization

Multivariate Regression with Calibration

no code implementations NeurIPS 2014 Han Liu, Lie Wang, Tuo Zhao

We propose a new method named calibrated multivariate regression (CMR) for fitting high dimensional multivariate regression models.

Activity Prediction

Sparse Inverse Covariance Estimation with Calibration

no code implementations NeurIPS 2013 Tuo Zhao, Han Liu

We propose a semiparametric procedure for estimating high dimensional sparse inverse covariance matrix.

Model Selection

Calibrated Multivariate Regression with Application to Neural Semantic Basis Discovery

no code implementations10 May 2013 Han Liu, Lie Wang, Tuo Zhao

We propose a calibrated multivariate regression method named CMR for fitting high dimensional multivariate regression models.

Activity Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.