no code implementations • ICML 2020 • Qianli Shen, Yan Li, Haoming Jiang, Zhaoran Wang, Tuo Zhao
In contrast to policy parameterized by linear/reproducing kernel functions, where simple regularization techniques suffice to control smoothness, for neural network based reinforcement learning algorithms, there is no readily available solution to learn a smooth policy.
no code implementations • 25 Jun 2022 • Qingru Zhang, Simiao Zuo, Chen Liang, Alexander Bukharin, Pengcheng He, Weizhu Chen, Tuo Zhao
Large Transformer-based models have exhibited superior performance in various natural language processing and computer vision tasks.
no code implementations • 9 Jun 2022 • Hao liu, Minshuo Chen, Siawpeng Er, Wenjing Liao, Tong Zhang, Tuo Zhao
Overparameterized neural networks enjoy great representation power on complex data, and more importantly yield sufficiently smooth output, which is crucial to their generalization and robustness.
no code implementations • 6 Jun 2022 • Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao
We consider the off-policy evaluation problem of reinforcement learning using deep neural networks.
no code implementations • 4 May 2022 • Jie Wang, Minshuo Chen, Tuo Zhao, Wenjing Liao, Yao Xie
Based on the approximation theory of neural networks, we show that the neural network IPM test has the type-II risk in the order of $n^{-(s+\beta)/d}$, which is in the same order of the type-II risk as the H\"older IPM test.
1 code implementation • 15 Apr 2022 • Simiao Zuo, Qingru Zhang, Chen Liang, Pengcheng He, Tuo Zhao, Weizhu Chen
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
1 code implementation • ACL 2022 • Chen Liang, Pengcheng He, Yelong Shen, Weizhu Chen, Tuo Zhao
To retain ensemble benefits while maintaining a low memory cost, we propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO.
no code implementations • 8 Apr 2022 • Rui Feng, Chen Luo, Qingyu Yin, Bing Yin, Tuo Zhao, Chao Zhang
User sessions empower many search and recommendation tasks on a daily basis.
no code implementations • 7 Feb 2022 • Tianyi Liu, Yan Li, Enlu Zhou, Tuo Zhao
We investigate the role of noise in optimization algorithms for learning over-parameterized models.
1 code implementation • ICLR 2022 • Chen Liang, Haoming Jiang, Simiao Zuo, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Tuo Zhao
Analysis shows that the proposed schedule indeed reduces the redundancy and improves generalization performance.
no code implementations • 24 Jan 2022 • Yan Li, Tuo Zhao, Guanghui Lan
The superlinear convergence takes effect after no more than $\mathcal{O}(\log(1/\Delta^*))$ number of iterations, where $\Delta^*$ is defined via a gap quantity associated with the optimal state-action value function; (2) HPMD also exhibits last-iterate convergence of the policy, with the limiting policy corresponding exactly to the optimal policy with the maximal entropy for every state.
no code implementations • 15 Jan 2022 • Guanghui Lan, Yan Li, Tuo Zhao
Compared to the traditional PG methods with a batch update rule, which visits and updates the policy for every state, BPMD methods have cheap per-iteration computation via a partial update rule that performs the policy update on a sampled state.
no code implementations • 6 Jan 2022 • Siawpeng Er, Edward Liu, Minshuo Chen, Yan Li, Yuqi Liu, Tuo Zhao, Hua Wang
This paper presents a deep learning assisted synthesis approach for direct end-to-end generation of RF/mm-wave passive matching network with 3D EM structures.
no code implementations • 1 Jan 2022 • Hao liu, Haizhao Yang, Minshuo Chen, Tuo Zhao, Wenjing Liao
Learning operators between infinitely dimensional spaces is an important learning task arising in wide applications in machine learning, imaging science, mathematical modeling and simulations, etc.
1 code implementation • NeurIPS 2021 • Minshuo Chen, Yan Li, Ethan Wang, Zhuoran Yang, Zhaoran Wang, Tuo Zhao
Theoretically, under a weak coverage assumption that the experience dataset contains enough information about the optimal policy, we prove that for an episodic mean-field MDP with a horizon $H$ and $N$ training trajectories, SAFARI attains a sub-optimality gap of $\mathcal{O}(H^2d_{\rm eff} /\sqrt{N})$, where $d_{\rm eff}$ is the effective dimension of the function class for parameterizing the value function, but independent on the number of agents.
no code implementations • ICLR 2022 • Yan Li, Dhruv Choudhary, Xiaohan Wei, Baichuan Yuan, Bhargav Bhushanam, Tuo Zhao, Guanghui Lan
We show that incorporating frequency information of tokens in the embedding learning problems leads to provably efficient algorithms, and demonstrate that common adaptive algorithms implicitly exploit the frequency information to a large extent.
1 code implementation • ICLR 2022 • Simiao Zuo, Xiaodong Liu, Jian Jiao, Young Jin Kim, Hany Hassan, Ruofei Zhang, Tuo Zhao, Jianfeng Gao
While most on-going research focuses on improving SAMs models by exploring methods of routing inputs to experts, our analysis reveals that such research might not lead to the solution we expect, i. e., the commonly-used routing methods based on gating mechanisms do not work better than randomly routing inputs to experts.
no code implementations • ICLR 2022 • Yuqing Wang, Minshuo Chen, Tuo Zhao, Molei Tao
Moreover, we rigorously establish an implicit bias of GD induced by such a large learning rate, termed 'balancing', meaning that magnitudes of $X$ and $Y$ at the limit of GD iterations will be close even if their initialization is significantly unbalanced.
no code implementations • 29 Sep 2021 • Yan Li, Lingxiao Wang, Jiachen Yang, Ethan Wang, Zhaoran Wang, Tuo Zhao, Hongyuan Zha
To exploit the permutation invariance therein, we propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation- invariant actor-critic neural architecture.
no code implementations • 16 Sep 2021 • Zhigen Zhao, Simiao Zuo, Tuo Zhao, Ye Zhao
Recent advancement in combining trajectory optimization with function approximation (especially neural networks) shows promise in learning complex control policies for diverse tasks in robot systems.
no code implementations • 15 Sep 2021 • Simiao Zuo, Yue Yu, Chen Liang, Haoming Jiang, Siawpeng Er, Chao Zhang, Tuo Zhao, Hongyuan Zha
In self-training, the student contributes to the prediction performance, and the teacher controls the training process by generating pseudo-labels.
1 code implementation • Findings (EMNLP) 2021 • Simiao Zuo, Chen Liang, Haoming Jiang, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Tuo Zhao
Adversarial regularization can improve model generalization in many natural language processing tasks.
no code implementations • 7 Sep 2021 • Hao liu, Minshuo Chen, Tuo Zhao, Wenjing Liao
Most of existing statistical theories on deep neural networks have sample complexities cursed by the data dimension and therefore cannot well explain the empirical success of deep learning on high-dimensional data.
no code implementations • 19 Aug 2021 • Danqing Zhang, Zheng Li, Tianyu Cao, Chen Luo, Tony Wu, Hanqing Lu, Yiwei Song, Bing Yin, Tuo Zhao, Qiang Yang
We study the problem of query attribute value extraction, which aims to identify named entities from user queries as diverse surface form attribute values and afterward transform them into formally canonical forms.
no code implementations • 15 Aug 2021 • Yan Li, Caleb Ju, Ethan X. Fang, Tuo Zhao
We show that BPPA attains non-trivial margin, which closely depends on the condition number of the distance generating function inducing the Bregman divergence.
1 code implementation • ACL 2021 • Haoming Jiang, Danqing Zhang, Tianyu Cao, Bing Yin, Tuo Zhao
Unfortunately, we observe that weakly labeled data does not necessarily improve, or even deteriorate the model performance (due to the extensive noise in the weak labels) when we train deep NER models over a simple or weighted combination of the strongly labeled and weakly labeled data.
1 code implementation • ACL 2021 • Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu, Pengcheng He, Tuo Zhao, Weizhu Chen
The Lottery Ticket Hypothesis suggests that an over-parametrized network consists of ``lottery tickets'', and training a certain collection of them (i. e., a subnetwork) can match the performance of the full model.
no code implementations • 18 May 2021 • Yan Li, Lingxiao Wang, Jiachen Yang, Ethan Wang, Zhaoran Wang, Tuo Zhao, Hongyuan Zha
To exploit the permutation invariance therein, we propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
no code implementations • 3 May 2021 • Siawpeng Er, Shihao Yang, Tuo Zhao
The global spread of COVID-19, the disease caused by the novel coronavirus SARS-CoV-2, has cast a significant threat to mankind.
1 code implementation • EMNLP 2021 • Simiao Zuo, Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Jianfeng Gao, Weizhu Chen, Tuo Zhao
Adversarial regularization has been shown to improve the generalization performance of deep learning models in various natural language processing tasks.
no code implementations • Findings (EMNLP) 2021 • Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Tuo Zhao
Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage.
no code implementations • 1 Mar 2021 • Jiachen Yang, Tarik Dzanic, Brenden Petersen, Jun Kudo, Ketan Mittal, Vladimir Tomov, Jean-Sylvain Camier, Tuo Zhao, Hongyuan Zha, Tzanio Kolev, Robert Anderson, Daniel Faissol
Large-scale finite element simulations of complex physical systems governed by partial differential equations (PDE) crucially depend on adaptive mesh refinement (AMR) to allocate computational budget to regions where higher resolution is required.
no code implementations • 24 Feb 2021 • Tianyi Liu, Yan Li, Song Wei, Enlu Zhou, Tuo Zhao
Numerous empirical evidences have corroborated the importance of noise in nonconvex optimization problems.
1 code implementation • EMNLP 2021 • Haoming Jiang, Bo Dai, Mengjiao Yang, Tuo Zhao, Wei Wei
An ideal environment for evaluating dialog systems, also known as the Turing test, needs to involve human interaction, which is usually not affordable for large-scale experiments.
Model-based Reinforcement Learning
reinforcement-learning
+1
no code implementations • NeurIPS 2020 • Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister
Finding the k largest or smallest elements from a collection of scores, i. e., top-k operation, is an important model component widely used in information retrieval, machine learning, and data mining.
no code implementations • NeurIPS 2020 • Kaixuan Huang, Yuqing Wang, Molei Tao, Tuo Zhao
We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity.
no code implementations • ICLR 2021 • Yujia Xie, Yixiu Mao, Simiao Zuo, Hongteng Xu, Xiaojing Ye, Tuo Zhao, Hongyuan Zha
Due to the combinatorial nature of the problem, most existing methods are only applicable when the sample size is small, and limited to linear regression models.
no code implementations • 3 Nov 2020 • Minshuo Chen, Hao liu, Wenjing Liao, Tuo Zhao
Our theory shows that deep neural networks are adaptive to the low-dimensional geometric structures of the covariates, and partially explains the success of deep learning for causal inference.
1 code implementation • EMNLP 2020 • Lingkai Kong, Haoming Jiang, Yuchen Zhuang, Jie Lyu, Tuo Zhao, Chao Zhang
Fine-tuned pre-trained language models can suffer from severe miscalibration for both in-distribution and out-of-distribution (OOD) data due to over-parameterization.
no code implementations • NeurIPS Workshop LMCA 2020 • Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister
The top-$k$ operation, i. e., finding the $k$ largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining.
1 code implementation • NAACL 2021 • Yue Yu, Simiao Zuo, Haoming Jiang, Wendi Ren, Tuo Zhao, Chao Zhang
To address this problem, we develop a contrastive self-training framework, COSINE, to enable fine-tuning LMs with weak supervision.
Ranked #1 on
Word Sense Disambiguation
on Words in Context
no code implementations • 12 Oct 2020 • Yu Bai, Minshuo Chen, Pan Zhou, Tuo Zhao, Jason D. Lee, Sham Kakade, Huan Wang, Caiming Xiong
A common practice in meta-learning is to perform a train-validation split (\emph{train-val method}) where the prior adapts to the task on one split of the data, and the resulting predictor is evaluated on another split.
no code implementations • 25 Aug 2020 • David Munzer, Siawpeng Er, Minshuo Chen, Yan Li, Naga S. Mannem, Tuo Zhao, Hua Wang
We propose using machine learning models for the direct synthesis of on-chip electromagnetic (EM) passive structures to enable rapid or even automated designs and optimizations of RF/mm-Wave circuits.
1 code implementation • 28 Jun 2020 • Chen Liang, Yue Yu, Haoming Jiang, Siawpeng Er, Ruijia Wang, Tuo Zhao, Chao Zhang
We study the open-domain named entity recognition (NER) problem under distant supervision.
1 code implementation • 27 Jun 2020 • Jason Ge, Xingguo Li, Haoming Jiang, Han Liu, Tong Zhang, Mengdi Wang, Tuo Zhao
We describe a new library named picasso, which implements a unified framework of pathwise coordinate optimization for a variety of sparse learning problems (e. g., sparse linear regression, sparse logistic regression, sparse Poisson regression and scaled sparse linear regression) combined with efficient active set selection strategies.
no code implementations • 27 Jun 2020 • Xingguo Li, Tuo Zhao, Xiaoming Yuan, Han Liu
This paper describes an R package named flare, which implements a family of new high dimensional regression methods (LAD Lasso, SQRT Lasso, $\ell_q$ Lasso, and Dantzig selector) and their extensions to sparse precision matrix estimation (TIGER and CLIME).
no code implementations • 26 Jun 2020 • Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty, Larry Wasserman
We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data.
no code implementations • NeurIPS 2020 • Minshuo Chen, Yu Bai, Jason D. Lee, Tuo Zhao, Huan Wang, Caiming Xiong, Richard Socher
When the trainable network is the quadratic Taylor model of a wide two-layer network, we show that neural representation can achieve improved sample complexities compared with the raw input: For learning a low-rank degree-$p$ polynomial ($p \geq 4$) in $d$ dimension, neural representation requires only $\tilde{O}(d^{\lceil p/2 \rceil})$ samples, while the best-known sample complexity upper bound for the raw input is $\tilde{O}(d^{p-1})$.
no code implementations • ICLR 2020 • Yan Li, Ethan X. Fang, Huan Xu, Tuo Zhao
Specifically, we show that for any fixed iteration $T$, when the adversarial perturbation during training has proper bounded L2 norm, the classifier learned by gradient descent based adversarial training converges in direction to the maximum L2 norm margin classifier at the rate of $O(1/\sqrt{T})$, significantly faster than the rate $O(1/\log T}$ of training with clean data.
no code implementations • 21 Mar 2020 • Qianli Shen, Yan Li, Haoming Jiang, Zhaoran Wang, Tuo Zhao
Deep reinforcement learning (RL) has achieved great empirical successes in various domains.
2 code implementations • ICML 2020 • Simiao Zuo, Haoming Jiang, Zichong Li, Tuo Zhao, Hongyuan Zha
Modern data acquisition routinely produce massive amounts of event sequence data in various domains, such as social media, healthcare, and financial markets.
no code implementations • 16 Feb 2020 • Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister
The top-k operation, i. e., finding the k largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining.
no code implementations • 14 Feb 2020 • Kaixuan Huang, Yuqing Wang, Molei Tao, Tuo Zhao
We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity.
no code implementations • 10 Feb 2020 • Minshuo Chen, Wenjing Liao, Hongyuan Zha, Tuo Zhao
This paper provides statistical guarantees of GANs for the estimation of data distributions which have densities in a H\"{o}lder space.
no code implementations • ICLR 2020 • Minshuo Chen, Yizhou Wang, Tianyi Liu, Zhuoran Yang, Xingguo Li, Zhaoran Wang, Tuo Zhao
Generative Adversarial Imitation Learning (GAIL) is a powerful and practical approach for learning sequential decision-making policies.
no code implementations • NeurIPS 2019 • Minshuo Chen, Haoming Jiang, Wenjing Liao, Tuo Zhao
The network size scales exponentially in the approximation error, with an exponent depending on the intrinsic dimension of the data and the smoothness of the function.
5 code implementations • ACL 2020 • Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Tuo Zhao
However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model.
Ranked #1 on
Semantic Textual Similarity
on MRPC
1 code implementation • ACL 2020 • Haoming Jiang, Chen Liang, Chong Wang, Tuo Zhao
To overcome this limitation, we propose a novel multi-domain NMT model using individual modules for each domain, on which we apply word-level, adaptive and layer-wise domain mixing.
no code implementations • ICLR 2019 • Minshuo Chen, Xingguo Li, Tuo Zhao
We remark: (1) Our generalization bound for vanilla RNNs is significantly tighter than the best of existing results; (2) We are not aware of any other generalization bounds for MGU, LSTM, and Conv RNNs in the exiting literature; (3) We demonstrate the advantages of these variants in generalization.
no code implementations • NeurIPS 2019 • Tianyi Liu, Minshuo Chen, Mo Zhou, Simon S. Du, Enlu Zhou, Tuo Zhao
We show, however, that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialized arbitrarily in a ball.
no code implementations • 7 Sep 2019 • Mo Zhou, Tianyi Liu, Yan Li, Dachao Lin, Enlu Zhou, Tuo Zhao
Numerous empirical evidence has corroborated that the noise plays a crucial rule in effective and efficient training of neural networks.
1 code implementation • NeurIPS 2019 • Yujia Xie, Haoming Jiang, Feng Liu, Tuo Zhao, Hongyuan Zha
This paper proposes a new meta-learning method -- named HARMLESS (HAwkes Relational Meta LEarning method for Short Sequences) for learning heterogeneous point process models from short event sequence data along with a relational network.
no code implementations • NeurIPS 2019 • Minshuo Chen, Haoming Jiang, Wenjing Liao, Tuo Zhao
It therefore demonstrates the adaptivity of deep ReLU networks to low-dimensional geometric structures of data, and partially explains the power of deep ReLU networks in tackling high-dimensional data with low-dimensional geometric structures.
no code implementations • 7 Jun 2019 • Yan Li, Ethan X. Fang, Huan Xu, Tuo Zhao
Specifically, we show that when the adversarial perturbation during training has bounded $\ell_2$-norm, the classifier learned by gradient descent based adversarial training converges in direction to the maximum $\ell_2$-norm margin classifier at the rate of $\tilde{\mathcal{O}}(1/\sqrt{T})$, significantly faster than the rate $\mathcal{O}(1/\log T)$ of training with clean data.
no code implementations • ICLR 2019 • Haoming Jiang, Zhehui Chen, Minshuo Chen, Feng Liu, Dingding Wang, Tuo Zhao
Generative Adversarial Networks (GANs), though powerful, is hard to train.
no code implementations • ICLR Workshop DeepGenStruct 2019 • Yujia Xie, Minshuo Chen, Haoming Jiang, Tuo Zhao, Hongyuan Zha
Optimal Transport (OT) naturally arises in many machine learning applications, yet the heavy computational burden limits its wide-spread uses.
no code implementations • ICLR 2019 • Xingguo Li, Junwei Lu, Zhaoran Wang, Jarvis Haupt, Tuo Zhao
We propose a generalization error bound for a general family of deep neural networks based on the depth and width of the networks, as well as the spectral norm of weight matrices.
no code implementations • ICLR Workshop DeepGenStruct 2019 • Zhehui Chen, Haoming Jiang, Yuyang Shi, Bo Dai, Tuo Zhao
From the perspective of generative learning, our proposed method can be viewed as learning a deep generative model for generating adversarial samples, which is adaptive to the robust classification.
no code implementations • 28 Dec 2018 • Haoming Jiang, Zhehui Chen, Minshuo Chen, Feng Liu, Dingding Wang, Tuo Zhao
Specifically, we propose a new reparameterization approach for the weight matrices of the discriminator in GANs, which allows us to directly manipulate the spectra of the weight matrices through various regularizers and constraints, without intensively computing singular value decompositions.
no code implementations • 3 Nov 2018 • Haoming Jiang, Zhehui Chen, Yuyang Shi, Bo Dai, Tuo Zhao
Adversarial training provides a principled approach for training robust neural networks.
no code implementations • NeurIPS 2018 • Ming Yu, Zhuoran Yang, Tuo Zhao, Mladen Kolar, Zhaoran Wang
In this paper, we study the Gaussian embedding model and develop the first theoretical results for exponential family embedding models.
no code implementations • 13 Jun 2018 • Xingguo Li, Junwei Lu, Zhaoran Wang, Jarvis Haupt, Tuo Zhao
We establish a margin based data dependent generalization error bound for a general family of deep neural networks in terms of the depth and width, as well as the Jacobian of the networks.
no code implementations • 13 Jun 2018 • Zhehui Chen, Xingguo Li, Lin F. Yang, Jarvis Haupt, Tuo Zhao
However, due to the lack of convexity, their landscape is not well understood and how to find the stable equilibria of the Lagrangian function is still unknown.
no code implementations • NeurIPS 2018 • Tianyi Liu, Shiyang Li, Jianping Shi, Enlu Zhou, Tuo Zhao
Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) is one of the most popular algorithms in distributed machine learning.
no code implementations • 11 Mar 2018 • Yingxiang Yang, Adams Wei Yu, Zhaoran Wang, Tuo Zhao
We propose a nonparametric method for detecting nonlinear causal relationship within a set of multidimensional discrete time series, by using sparse additive models (SpAMs).
no code implementations • NeurIPS 2018 • Minshuo Chen, Lin Yang, Mengdi Wang, Tuo Zhao
Specifically, our goal is to estimate the principle component of time series data with respect to the covariance matrix of the stationary distribution.
no code implementations • 14 Feb 2018 • Tianyi Liu, Zhehui Chen, Enlu Zhou, Tuo Zhao
Our theoretical discovery partially corroborates the empirical success of MSGD in training deep neural networks.
no code implementations • 18 Dec 2017 • Zhuoran Yang, Lin F. Yang, Ethan X. Fang, Tuo Zhao, Zhaoran Wang, Matey Neykov
Existing nonconvex statistical optimization theory and methods crucially rely on the correct specification of the underlying "true" statistical models.
no code implementations • NeurIPS 2017 • Haotian Pang, Han Liu, Robert J. Vanderbei, Tuo Zhao
High dimensional sparse learning has imposed a great computational challenge to large scale data analysis.
no code implementations • NeurIPS 2017 • Xingguo Li, Lin Yang, Jason Ge, Jarvis Haupt, Tong Zhang, Tuo Zhao
We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions.
no code implementations • NeurIPS 2017 • Weiyang Liu, Yan-Ming Zhang, Xingguo Li, Zhiding Yu, Bo Dai, Tuo Zhao, Le Song
In light of such challenges, we propose hyperspherical convolution (SphereConv), a novel learning framework that gives angular representations on hyperspheres.
no code implementations • ICML 2017 • Zhehui Chen, Lin F. Yang, Chris Junchi Li, Tuo Zhao
Multiview representation learning is popular for latent factor analysis.
no code implementations • 19 Jun 2017 • Xingguo Li, Lin F. Yang, Jason Ge, Jarvis Haupt, Tong Zhang, Tuo Zhao
We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions.
no code implementations • 22 May 2017 • Lin F. Yang, Vladimir Braverman, Tuo Zhao, Mengdi Wang
We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm.
no code implementations • 4 Apr 2017 • Haotian Pang, Robert Vanderbei, Han Liu, Tuo Zhao
High dimensional sparse learning has imposed a great computational challenge to large scale data analysis.
no code implementations • 27 Feb 2017 • Zhehui Chen, Lin F. Yang, Chris J. Li, Tuo Zhao
Multiview representation learning is very popular for latent factor analysis.
no code implementations • 29 Dec 2016 • Xingguo Li, Junwei Lu, Raman Arora, Jarvis Haupt, Han Liu, Zhaoran Wang, Tuo Zhao
We propose a general theory for studying the \xl{landscape} of nonconvex \xl{optimization} with underlying symmetric structures \tz{for a class of machine learning problems (e. g., low-rank matrix factorization, phase retrieval, and deep linear neural networks)}.
no code implementations • NeurIPS 2018 • Lin F. Yang, R. Arora, V. Braverman, Tuo Zhao
We use differential equations based approaches to provide some {\it \textbf{physics}} insights into analyzing the dynamics of popular optimization algorithms in machine learning.
no code implementations • 10 Jul 2016 • Xingguo Li, Tuo Zhao, Raman Arora, Han Liu, Mingyi Hong
In particular, we first show that for a family of quadratic minimization problems, the iteration complexity $\mathcal{O}(\log^2(p)\cdot\log(1/\epsilon))$ of the CBCD-type methods matches that of the GD methods in term of dependency on $p$, up to a $\log^2 p$ factor.
no code implementations • 25 May 2016 • Xingguo Li, Haoming Jiang, Jarvis Haupt, Raman Arora, Han Liu, Mingyi Hong, Tuo Zhao
Many machine learning techniques sacrifice convenient computational structures to gain estimation robustness and modeling flexibility.
no code implementations • NeurIPS 2016 • Davood Hajinezhad, Mingyi Hong, Tuo Zhao, Zhaoran Wang
We study a stochastic and distributed algorithm for nonconvex problems whose objective consists of a sum of $N$ nonconvex $L_i/N$-smooth functions, plus a nonsmooth regularizer.
no code implementations • 9 May 2016 • Xingguo Li, Raman Arora, Han Liu, Jarvis Haupt, Tuo Zhao
We propose a stochastic variance reduced optimization algorithm for solving sparse learning problems with cardinality constraints.
no code implementations • NeurIPS 2015 • Tuo Zhao, Zhaoran Wang, Han Liu
We study the estimation of low rank matrices via nonconvex optimization.
no code implementations • 23 Dec 2014 • Tuo Zhao, Han Liu, Tong Zhang
This is the first result on the computational and statistical guarantees of the pathwise coordinate optimization framework in high dimensions.
no code implementations • NeurIPS 2014 • Tuo Zhao, Mo Yu, Yiming Wang, Raman Arora, Han Liu
When the regularization function is block separable, we can solve the minimization problems in a randomized block coordinate descent (RBCD) manner.
no code implementations • NeurIPS 2014 • Han Liu, Lie Wang, Tuo Zhao
We propose a new method named calibrated multivariate regression (CMR) for fitting high dimensional multivariate regression models.
no code implementations • NeurIPS 2013 • Tuo Zhao, Han Liu
We propose a semiparametric procedure for estimating high dimensional sparse inverse covariance matrix.
no code implementations • 10 May 2013 • Han Liu, Lie Wang, Tuo Zhao
We propose a calibrated multivariate regression method named CMR for fitting high dimensional multivariate regression models.