no code implementations • ICML 2020 • Qianli Shen, Yan Li, Haoming Jiang, Zhaoran Wang, Tuo Zhao
In contrast to policy parameterized by linear/reproducing kernel functions, where simple regularization techniques suffice to control smoothness, for neural network based reinforcement learning algorithms, there is no readily available solution to learn a smooth policy.
1 code implementation • 20 Apr 2025 • Lawrence Liu, Inesh Chakrabarti, Yixiao Li, Mengdi Wang, Tuo Zhao, Lin F. Yang
Large language models (LLMs) exhibit remarkable performance across various natural language processing tasks but suffer from immense computational and memory demands, limiting their deployment in resource-constrained environments.
no code implementations • 7 Mar 2025 • Yixiao Li, Xianzhi Du, Ajay Jaiswal, Tao Lei, Tuo Zhao, Chong Wang, Jianyu Wang
We study the enlarge-and-prune pipeline as an integrated system to address two critical questions: whether it is worth pretraining an enlarged model even when the model is never deployed, and how to optimize the entire pipeline for better pruned models.
no code implementations • 4 Mar 2025 • LiMing Liu, Zixuan Zhang, Simon Du, Tuo Zhao
Through this model, we rigorously prove the existence of progressive sharpening and self-stabilization under large learning rates, and establish non-asymptotic analysis of the training dynamics and sharpness along the entire GD trajectory.
1 code implementation • 24 Feb 2025 • LiMing Liu, Zhenghao Xu, Zixuan Zhang, Hao Kang, Zichong Li, Chen Liang, Weizhu Chen, Tuo Zhao
In this paper, we propose COSMOS, a novel hybrid optimizer that leverages the varying importance of eigensubspaces in the gradient matrix to achieve memory efficiency without compromising optimization performance.
no code implementations • 16 Feb 2025 • Tianci Liu, Haoxiang Jiang, Tianze Wang, ran Xu, Yue Yu, Linjun Zhang, Tuo Zhao, Haoyu Wang
Large language models (LLMs) have achieved impressive performance but face high computational costs and latency, limiting their deployment in resource-constrained settings.
no code implementations • 12 Oct 2024 • Zhenghao Xu, Yuqing Wang, Tuo Zhao, Rachel Ward, Molei Tao
We study the convergence rate of first-order methods for rectangular matrix factorization, which is a canonical nonconvex optimization problem.
no code implementations • 16 Sep 2024 • Qingru Zhang, Xiaodong Yu, Chandan Singh, Xiaodong Liu, Liyuan Liu, Jianfeng Gao, Tuo Zhao, Dan Roth, Hao Cheng
However, they often struggle to fully comprehend and effectively utilize their input contexts, resulting in responses that are unfaithful or hallucinated.
no code implementations • 10 Sep 2024 • Kuan Wang, Alexander Bukharin, Haoming Jiang, Qingyu Yin, Zhengyang Wang, Tuo Zhao, Jingbo Shang, Chao Zhang, Bing Yin, Xian Li, Jianshu Chen, Shiyang Li
However, existing models trained on open-source IFT datasets only have the ability to follow instructions from users, and often fail to follow complex role and rules specified by developers, a. k. a.
no code implementations • 21 Jun 2024 • Alexander Bukharin, Ilgee Hong, Haoming Jiang, Zichong Li, Qingru Zhang, Zixuan Zhang, Tuo Zhao
To tackle this challenge, we propose a robust RLHF approach -- $R^3M$, which models the potentially corrupted preference label as sparse outliers.
1 code implementation • 16 Jun 2024 • Haoyu Wang, Tianci Liu, Ruirui Li, Monica Cheng, Tuo Zhao, Jing Gao
By adding a sparsity constraint on the product of low-rank matrices and converting it to row and column-wise sparsity, we ensure efficient and precise model updates.
no code implementations • 4 Jun 2024 • Ilgee Hong, Zichong Li, Alexander Bukharin, Yixiao Li, Haoming Jiang, Tianbao Yang, Tuo Zhao
By incorporating an adaptive scaling parameter into the loss for each pair, our method increases the flexibility of the reward function.
1 code implementation • 6 Apr 2024 • Zi-Hao Qiu, Siqi Guo, Mao Xu, Tuo Zhao, Lijun Zhang, Tianbao Yang
In this paper, we present a principled framework for learning a small yet generalizable temperature prediction network (TempNet) to improve LFMs.
no code implementations • 3 Apr 2024 • Hoang Huy Nguyen, Yan Li, Tuo Zhao
In modern decentralized applications, ensuring communication efficiency and privacy for the users are the key challenges.
2 code implementations • 8 Mar 2024 • Hao Kang, Qingru Zhang, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tushar Krishna, Tuo Zhao
Key-value (KV) caching has become the de-facto to accelerate generation speed for large language models (LLMs) inference.
no code implementations • 16 Feb 2024 • Haoyu Wang, Ruirui Li, Haoming Jiang, Jinjin Tian, Zhengyang Wang, Chen Luo, Xianfeng Tang, Monica Cheng, Tuo Zhao, Jing Gao
Retrieval-augmented Large Language Models (LLMs) offer substantial benefits in enhancing performance across knowledge-intensive scenarios.
no code implementations • 21 Nov 2023 • Alexander Bukharin, Shiyang Li, Zhengyang Wang, Jingfeng Yang, Bing Yin, Xian Li, Chao Zhang, Tuo Zhao, Haoming Jiang
QDIT provides a simple method to simultaneously control dataset diversity and quality, allowing us to conduct an in-depth study on the effect of diversity and quality on instruction tuning performance.
1 code implementation • 3 Nov 2023 • Qingru Zhang, Chandan Singh, Liyuan Liu, Xiaodong Liu, Bin Yu, Jianfeng Gao, Tuo Zhao
In human-written articles, we often leverage the subtleties of text style, such as bold and italics, to guide the attention of readers.
no code implementations • 26 Oct 2023 • Yuqing Wang, Zhenghao Xu, Tuo Zhao, Molei Tao
This regularity, together with gradient descent using a large learning rate that favors flatter regions, results in these nontrivial dynamical behaviors.
no code implementations • 25 Oct 2023 • Zichong Li, Qunzhi Xu, Zhenghao Xu, Yajun Mei, Tuo Zhao, Hongyuan Zha
Specifically, our framework adopts a normalization-free objective by estimating the pseudolikelihood of marked STPPs through score-matching and offers uncertainty quantification for the predicted event time, location and mark by computing confidence regions over the generated samples.
1 code implementation • 25 Oct 2023 • Zichong Li, Yanbo Xu, Simiao Zuo, Haoming Jiang, Chao Zhang, Tuo Zhao, Hongyuan Zha
We conduct extensive experiments in both event type prediction and uncertainty quantification of arrival time.
no code implementations • 19 Oct 2023 • Qingru Zhang, Dhananjay Ram, Cole Hawkins, Sheng Zha, Tuo Zhao
These models leverage the attention mechanism to capture long- and short-range dependencies in the sequence.
1 code implementation • 12 Oct 2023 • Yixiao Li, Yifan Yu, Chen Liang, Pengcheng He, Nikos Karampatziakis, Weizhu Chen, Tuo Zhao
Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning.
no code implementations • 25 Sep 2023 • Zhenghao Xu, Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao
As a result, by properly choosing the network size and hyperparameters, NPMD can find an $\epsilon$-optimal policy with $\widetilde{O}(\epsilon^{-\frac{d}{\alpha}-2})$ samples in expectation, where $\alpha\in(0, 1]$ indicates the smoothness of environment.
no code implementations • 18 Sep 2023 • Ethan X. Fang, Yajun Mei, Yuyang Shi, Qunzhi Xu, Tuo Zhao
We consider the linear discriminant analysis problem in the high-dimensional settings.
1 code implementation • 6 Sep 2023 • Alexander Bukharin, Yixiao Li, Pengcheng He, Tuo Zhao
Researchers typically utilize feedback signals from the environment to handcraft a reward function, but this process is not always effective due to the varying scale and intricate dependencies of the feedback signals.
no code implementations • 24 Jul 2023 • Xiang Ji, Huazheng Wang, Minshuo Chen, Tuo Zhao, Mengdi Wang
A popular approach is to utilize human feedback to learn a reward function for training.
no code implementations • 4 Jul 2023 • Zixuan Zhang, Kaiqi Zhang, Minshuo Chen, Yuma Takeda, Mengdi Wang, Tuo Zhao, Yu-Xiang Wang
Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkable prediction performance in practice, which cannot be well explained by conventional wisdom.
no code implementations • 26 Jun 2023 • Zixuan Zhang, Minshuo Chen, Mengdi Wang, Wenjing Liao, Tuo Zhao
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to the intrinsic data structures.
no code implementations • 20 Jun 2023 • Yixiao Li, Yifan Yu, Qingru Zhang, Chen Liang, Pengcheng He, Weizhu Chen, Tuo Zhao
Pruning enhances the diversity of low-rank approximations, and low-rank approximation prevents pruning from losing too many expressive neurons.
1 code implementation • 5 Jun 2023 • Alexander Bukharin, Tianyi Liu, Shengjie Wang, Simiao Zuo, Weihao Gao, Wen Yan, Tuo Zhao
To address this issue, we propose a multi-stage computational framework -- ASTEROID, which lowers the data cost of MLFFs by leveraging a combination of cheap inaccurate data and expensive accurate data.
2 code implementations • 18 Mar 2023 • Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, Tuo Zhao
Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e. g., low-rank increments.
no code implementations • 25 Feb 2023 • Biraj Dahal, Alex Havrilla, Minshuo Chen, Tuo Zhao, Wenjing Liao
Many existing experiments have demonstrated that generative networks can generate high-dimensional complex data from a low-dimensional easy-to-sample distribution.
no code implementations • 19 Feb 2023 • Chen Liang, Haoming Jiang, Zheng Li, Xianfeng Tang, Bin Yin, Tuo Zhao
Since the teacher model has a significantly larger capacity and stronger representation power than the student model, it is very difficult for the student to produce predictions that match the teacher's over a massive amount of open-domain training data.
no code implementations • 14 Feb 2023 • Minshuo Chen, Kaixuan Huang, Tuo Zhao, Mengdi Wang
Furthermore, the generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.
1 code implementation • 15 Dec 2022 • Simiao Zuo, Xiaodong Liu, Jian Jiao, Denis Charles, Eren Manavoglu, Tuo Zhao, Jianfeng Gao
Specifically, we augment a SSM into the bottom layer of SPADE, and we employ efficient local attention methods for the other layers.
no code implementations • 1 Dec 2022 • Jiahui Cheng, Minshuo Chen, Hao liu, Tuo Zhao, Wenjing Liao
Label Shift has been widely believed to be harmful to the generalization performance of machine learning models.
1 code implementation • 4 Oct 2022 • Chen Liang, Simiao Zuo, Qingru Zhang, Pengcheng He, Weizhu Chen, Tuo Zhao
As such, TED reduces the knowledge gap between the two models and helps the student to fit better on the target task.
no code implementations • 21 Sep 2022 • Yan Li, Guanghui Lan, Tuo Zhao
We consider the problem of solving robust Markov decision process (MDP), which involves a set of discounted, finite state, finite action space MDPs with uncertain transition kernels.
no code implementations • 15 Sep 2022 • Simiao Zuo, Tianyi Liu, Tuo Zhao, Hongyuan Zha
Point process models are of great importance in real world applications.
no code implementations • 15 Sep 2022 • Simiao Zuo, Qingyu Yin, Haoming Jiang, Shaohui Xi, Bing Yin, Chao Zhang, Tuo Zhao
The model subsequently calculates session representations by combining the contextual information with the instant search query using an aggregation network.
no code implementations • 15 Sep 2022 • Simiao Zuo, Haoming Jiang, Qingyu Yin, Xianfeng Tang, Bing Yin, Tuo Zhao
Specifically, we train a generator to recover identities of the masked edges, and simultaneously, we train a discriminator to distinguish the generated edges from the original graph's edges.
1 code implementation • 25 Jun 2022 • Qingru Zhang, Simiao Zuo, Chen Liang, Alexander Bukharin, Pengcheng He, Weizhu Chen, Tuo Zhao
Large Transformer-based models have exhibited superior performance in various natural language processing and computer vision tasks.
no code implementations • 9 Jun 2022 • Hao liu, Minshuo Chen, Siawpeng Er, Wenjing Liao, Tong Zhang, Tuo Zhao
Overparameterized neural networks enjoy great representation power on complex data, and more importantly yield sufficiently smooth output, which is crucial to their generalization and robustness.
no code implementations • 6 Jun 2022 • Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao
We consider the off-policy evaluation problem of reinforcement learning using deep convolutional neural networks.
no code implementations • 4 May 2022 • Jie Wang, Minshuo Chen, Tuo Zhao, Wenjing Liao, Yao Xie
Based on the approximation theory of neural networks, we show that the neural network IPM test has the type-II risk in the order of $n^{-(s+\beta)/d}$, which is in the same order of the type-II risk as the H\"older IPM test.
1 code implementation • NAACL 2022 • Simiao Zuo, Qingru Zhang, Chen Liang, Pengcheng He, Tuo Zhao, Weizhu Chen
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
1 code implementation • ACL 2022 • Chen Liang, Pengcheng He, Yelong Shen, Weizhu Chen, Tuo Zhao
To retain ensemble benefits while maintaining a low memory cost, we propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO.
no code implementations • NAACL 2022 • Rui Feng, Chen Luo, Qingyu Yin, Bing Yin, Tuo Zhao, Chao Zhang
User sessions empower many search and recommendation tasks on a daily basis.
no code implementations • 7 Feb 2022 • Tianyi Liu, Yan Li, Enlu Zhou, Tuo Zhao
We investigate the role of noise in optimization algorithms for learning over-parameterized models.
1 code implementation • ICLR 2022 • Chen Liang, Haoming Jiang, Simiao Zuo, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Tuo Zhao
Analysis shows that the proposed schedule indeed reduces the redundancy and improves generalization performance.
no code implementations • 24 Jan 2022 • Yan Li, Guanghui Lan, Tuo Zhao
We first establish the global linear convergence of HPMD instantiated with Kullback-Leibler divergence, for both the optimality gap, and a weighted distance to the set of optimal policies.
no code implementations • 15 Jan 2022 • Guanghui Lan, Yan Li, Tuo Zhao
Despite the nonconvex nature of the problem and a partial update rule, we provide a unified analysis for several sampling schemes, and show that BPMD achieves fast linear convergence to the global optimality.
no code implementations • 6 Jan 2022 • Siawpeng Er, Edward Liu, Minshuo Chen, Yan Li, Yuqi Liu, Tuo Zhao, Hua Wang
This paper presents a deep learning assisted synthesis approach for direct end-to-end generation of RF/mm-wave passive matching network with 3D EM structures.
no code implementations • 1 Jan 2022 • Hao liu, Haizhao Yang, Minshuo Chen, Tuo Zhao, Wenjing Liao
Learning operators between infinitely dimensional spaces is an important learning task arising in wide applications in machine learning, imaging science, mathematical modeling and simulations, etc.
1 code implementation • NeurIPS 2021 • Minshuo Chen, Yan Li, Ethan Wang, Zhuoran Yang, Zhaoran Wang, Tuo Zhao
Theoretically, under a weak coverage assumption that the experience dataset contains enough information about the optimal policy, we prove that for an episodic mean-field MDP with a horizon $H$ and $N$ training trajectories, SAFARI attains a sub-optimality gap of $\mathcal{O}(H^2d_{\rm eff} /\sqrt{N})$, where $d_{\rm eff}$ is the effective dimension of the function class for parameterizing the value function, but independent on the number of agents.
no code implementations • ICLR 2022 • Yan Li, Dhruv Choudhary, Xiaohan Wei, Baichuan Yuan, Bhargav Bhushanam, Tuo Zhao, Guanghui Lan
We show that incorporating frequency information of tokens in the embedding learning problems leads to provably efficient algorithms, and demonstrate that common adaptive algorithms implicitly exploit the frequency information to a large extent.
1 code implementation • ICLR 2022 • Simiao Zuo, Xiaodong Liu, Jian Jiao, Young Jin Kim, Hany Hassan, Ruofei Zhang, Tuo Zhao, Jianfeng Gao
While most on-going research focuses on improving SAMs models by exploring methods of routing inputs to experts, our analysis reveals that such research might not lead to the solution we expect, i. e., the commonly-used routing methods based on gating mechanisms do not work better than randomly routing inputs to experts.
no code implementations • ICLR 2022 • Yuqing Wang, Minshuo Chen, Tuo Zhao, Molei Tao
Moreover, we rigorously establish an implicit bias of GD induced by such a large learning rate, termed 'balancing', meaning that magnitudes of $X$ and $Y$ at the limit of GD iterations will be close even if their initialization is significantly unbalanced.
no code implementations • 29 Sep 2021 • Yan Li, Lingxiao Wang, Jiachen Yang, Ethan Wang, Zhaoran Wang, Tuo Zhao, Hongyuan Zha
To exploit the permutation invariance therein, we propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation- invariant actor-critic neural architecture.
no code implementations • 16 Sep 2021 • Zhigen Zhao, Simiao Zuo, Tuo Zhao, Ye Zhao
Recent advancement in combining trajectory optimization with function approximation (especially neural networks) shows promise in learning complex control policies for diverse tasks in robot systems.
1 code implementation • Findings (EMNLP) 2021 • Simiao Zuo, Chen Liang, Haoming Jiang, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Tuo Zhao
Adversarial regularization can improve model generalization in many natural language processing tasks.
no code implementations • Findings (NAACL) 2022 • Simiao Zuo, Yue Yu, Chen Liang, Haoming Jiang, Siawpeng Er, Chao Zhang, Tuo Zhao, Hongyuan Zha
In self-training, the student contributes to the prediction performance, and the teacher controls the training process by generating pseudo-labels.
no code implementations • 7 Sep 2021 • Hao liu, Minshuo Chen, Tuo Zhao, Wenjing Liao
Most of existing statistical theories on deep neural networks have sample complexities cursed by the data dimension and therefore cannot well explain the empirical success of deep learning on high-dimensional data.
no code implementations • 19 Aug 2021 • Danqing Zhang, Zheng Li, Tianyu Cao, Chen Luo, Tony Wu, Hanqing Lu, Yiwei Song, Bing Yin, Tuo Zhao, Qiang Yang
We study the problem of query attribute value extraction, which aims to identify named entities from user queries as diverse surface form attribute values and afterward transform them into formally canonical forms.
no code implementations • 15 Aug 2021 • Yan Li, Caleb Ju, Ethan X. Fang, Tuo Zhao
For any BPPA instantiated with a fixed Bregman divergence, we provide a lower bound of the margin obtained by BPPA with respect to an arbitrarily chosen norm.
1 code implementation • ACL 2021 • Haoming Jiang, Danqing Zhang, Tianyu Cao, Bing Yin, Tuo Zhao
Unfortunately, we observe that weakly labeled data does not necessarily improve, or even deteriorate the model performance (due to the extensive noise in the weak labels) when we train deep NER models over a simple or weighted combination of the strongly labeled and weakly labeled data.
1 code implementation • ACL 2021 • Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu, Pengcheng He, Tuo Zhao, Weizhu Chen
The Lottery Ticket Hypothesis suggests that an over-parametrized network consists of ``lottery tickets'', and training a certain collection of them (i. e., a subnetwork) can match the performance of the full model.
no code implementations • 18 May 2021 • Yan Li, Lingxiao Wang, Jiachen Yang, Ethan Wang, Zhaoran Wang, Tuo Zhao, Hongyuan Zha
To exploit the permutation invariance therein, we propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
no code implementations • 3 May 2021 • Siawpeng Er, Shihao Yang, Tuo Zhao
The global spread of COVID-19, the disease caused by the novel coronavirus SARS-CoV-2, has cast a significant threat to mankind.
1 code implementation • EMNLP 2021 • Simiao Zuo, Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Jianfeng Gao, Weizhu Chen, Tuo Zhao
Adversarial regularization has been shown to improve the generalization performance of deep learning models in various natural language processing tasks.
no code implementations • Findings (EMNLP) 2021 • Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Tuo Zhao
Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage.
no code implementations • 1 Mar 2021 • Jiachen Yang, Tarik Dzanic, Brenden Petersen, Jun Kudo, Ketan Mittal, Vladimir Tomov, Jean-Sylvain Camier, Tuo Zhao, Hongyuan Zha, Tzanio Kolev, Robert Anderson, Daniel Faissol
Large-scale finite element simulations of complex physical systems governed by partial differential equations (PDE) crucially depend on adaptive mesh refinement (AMR) to allocate computational budget to regions where higher resolution is required.
no code implementations • 24 Feb 2021 • Tianyi Liu, Yan Li, Song Wei, Enlu Zhou, Tuo Zhao
Numerous empirical evidences have corroborated the importance of noise in nonconvex optimization problems.
1 code implementation • EMNLP 2021 • Haoming Jiang, Bo Dai, Mengjiao Yang, Tuo Zhao, Wei Wei
An ideal environment for evaluating dialog systems, also known as the Turing test, needs to involve human interaction, which is usually not affordable for large-scale experiments.
no code implementations • NeurIPS 2020 • Kaixuan Huang, Yuqing Wang, Molei Tao, Tuo Zhao
We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity.
no code implementations • NeurIPS 2020 • Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister
Finding the k largest or smallest elements from a collection of scores, i. e., top-k operation, is an important model component widely used in information retrieval, machine learning, and data mining.
no code implementations • ICLR 2021 • Yujia Xie, Yixiu Mao, Simiao Zuo, Hongteng Xu, Xiaojing Ye, Tuo Zhao, Hongyuan Zha
Due to the combinatorial nature of the problem, most existing methods are only applicable when the sample size is small, and limited to linear regression models.
no code implementations • 3 Nov 2020 • Minshuo Chen, Hao liu, Wenjing Liao, Tuo Zhao
Our theory shows that deep neural networks are adaptive to the low-dimensional geometric structures of the covariates, and partially explains the success of deep learning for causal inference.
1 code implementation • EMNLP 2020 • Lingkai Kong, Haoming Jiang, Yuchen Zhuang, Jie Lyu, Tuo Zhao, Chao Zhang
Fine-tuned pre-trained language models can suffer from severe miscalibration for both in-distribution and out-of-distribution (OOD) data due to over-parameterization.
no code implementations • NeurIPS Workshop LMCA 2020 • Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister
The top-$k$ operation, i. e., finding the $k$ largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining.
1 code implementation • NAACL 2021 • Yue Yu, Simiao Zuo, Haoming Jiang, Wendi Ren, Tuo Zhao, Chao Zhang
To address this problem, we develop a contrastive self-training framework, COSINE, to enable fine-tuning LMs with weak supervision.
Ranked #1 on
Word Sense Disambiguation
on Words in Context
no code implementations • 12 Oct 2020 • Yu Bai, Minshuo Chen, Pan Zhou, Tuo Zhao, Jason D. Lee, Sham Kakade, Huan Wang, Caiming Xiong
A common practice in meta-learning is to perform a train-validation split (\emph{train-val method}) where the prior adapts to the task on one split of the data, and the resulting predictor is evaluated on another split.
no code implementations • 25 Aug 2020 • David Munzer, Siawpeng Er, Minshuo Chen, Yan Li, Naga S. Mannem, Tuo Zhao, Hua Wang
We propose using machine learning models for the direct synthesis of on-chip electromagnetic (EM) passive structures to enable rapid or even automated designs and optimizations of RF/mm-Wave circuits.
1 code implementation • 28 Jun 2020 • Chen Liang, Yue Yu, Haoming Jiang, Siawpeng Er, Ruijia Wang, Tuo Zhao, Chao Zhang
We study the open-domain named entity recognition (NER) problem under distant supervision.
no code implementations • 27 Jun 2020 • Xingguo Li, Tuo Zhao, Xiaoming Yuan, Han Liu
This paper describes an R package named flare, which implements a family of new high dimensional regression methods (LAD Lasso, SQRT Lasso, $\ell_q$ Lasso, and Dantzig selector) and their extensions to sparse precision matrix estimation (TIGER and CLIME).
1 code implementation • 27 Jun 2020 • Jason Ge, Xingguo Li, Haoming Jiang, Han Liu, Tong Zhang, Mengdi Wang, Tuo Zhao
We describe a new library named picasso, which implements a unified framework of pathwise coordinate optimization for a variety of sparse learning problems (e. g., sparse linear regression, sparse logistic regression, sparse Poisson regression and scaled sparse linear regression) combined with efficient active set selection strategies.
no code implementations • 26 Jun 2020 • Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty, Larry Wasserman
We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data.
no code implementations • NeurIPS 2020 • Minshuo Chen, Yu Bai, Jason D. Lee, Tuo Zhao, Huan Wang, Caiming Xiong, Richard Socher
When the trainable network is the quadratic Taylor model of a wide two-layer network, we show that neural representation can achieve improved sample complexities compared with the raw input: For learning a low-rank degree-$p$ polynomial ($p \geq 4$) in $d$ dimension, neural representation requires only $\tilde{O}(d^{\lceil p/2 \rceil})$ samples, while the best-known sample complexity upper bound for the raw input is $\tilde{O}(d^{p-1})$.
no code implementations • ICLR 2020 • Yan Li, Ethan X. Fang, Huan Xu, Tuo Zhao
Specifically, we show that for any fixed iteration $T$, when the adversarial perturbation during training has proper bounded L2 norm, the classifier learned by gradient descent based adversarial training converges in direction to the maximum L2 norm margin classifier at the rate of $O(1/\sqrt{T})$, significantly faster than the rate $O(1/\log T}$ of training with clean data.
no code implementations • 21 Mar 2020 • Qianli Shen, Yan Li, Haoming Jiang, Zhaoran Wang, Tuo Zhao
Deep reinforcement learning (RL) has achieved great empirical successes in various domains.
3 code implementations • ICML 2020 • Simiao Zuo, Haoming Jiang, Zichong Li, Tuo Zhao, Hongyuan Zha
Modern data acquisition routinely produce massive amounts of event sequence data in various domains, such as social media, healthcare, and financial markets.
no code implementations • 16 Feb 2020 • Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister
The top-k operation, i. e., finding the k largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining.
no code implementations • 14 Feb 2020 • Kaixuan Huang, Yuqing Wang, Molei Tao, Tuo Zhao
We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity.
no code implementations • 10 Feb 2020 • Minshuo Chen, Wenjing Liao, Hongyuan Zha, Tuo Zhao
Generative Adversarial Networks (GANs) have achieved a great success in unsupervised learning.
no code implementations • ICLR 2020 • Minshuo Chen, Yizhou Wang, Tianyi Liu, Zhuoran Yang, Xingguo Li, Zhaoran Wang, Tuo Zhao
Generative Adversarial Imitation Learning (GAIL) is a powerful and practical approach for learning sequential decision-making policies.
no code implementations • NeurIPS 2019 • Minshuo Chen, Haoming Jiang, Wenjing Liao, Tuo Zhao
The network size scales exponentially in the approximation error, with an exponent depending on the intrinsic dimension of the data and the smoothness of the function.
6 code implementations • ACL 2020 • Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Tuo Zhao
However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model.
Ranked #1 on
Natural Language Inference
on QNLI
1 code implementation • ACL 2020 • Haoming Jiang, Chen Liang, Chong Wang, Tuo Zhao
To overcome this limitation, we propose a novel multi-domain NMT model using individual modules for each domain, on which we apply word-level, adaptive and layer-wise domain mixing.
no code implementations • ICLR 2019 • Minshuo Chen, Xingguo Li, Tuo Zhao
We remark: (1) Our generalization bound for vanilla RNNs is significantly tighter than the best of existing results; (2) We are not aware of any other generalization bounds for MGU, LSTM, and Conv RNNs in the exiting literature; (3) We demonstrate the advantages of these variants in generalization.
no code implementations • NeurIPS 2019 • Tianyi Liu, Minshuo Chen, Mo Zhou, Simon S. Du, Enlu Zhou, Tuo Zhao
We show, however, that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialized arbitrarily in a ball.
no code implementations • 7 Sep 2019 • Mo Zhou, Tianyi Liu, Yan Li, Dachao Lin, Enlu Zhou, Tuo Zhao
Numerous empirical evidence has corroborated that the noise plays a crucial rule in effective and efficient training of neural networks.
1 code implementation • NeurIPS 2019 • Yujia Xie, Haoming Jiang, Feng Liu, Tuo Zhao, Hongyuan Zha
This paper proposes a new meta-learning method -- named HARMLESS (HAwkes Relational Meta LEarning method for Short Sequences) for learning heterogeneous point process models from short event sequence data along with a relational network.
no code implementations • NeurIPS 2019 • Minshuo Chen, Haoming Jiang, Wenjing Liao, Tuo Zhao
It therefore demonstrates the adaptivity of deep ReLU networks to low-dimensional geometric structures of data, and partially explains the power of deep ReLU networks in tackling high-dimensional data with low-dimensional geometric structures.
no code implementations • 7 Jun 2019 • Yan Li, Ethan X. Fang, Huan Xu, Tuo Zhao
Specifically, we show that when the adversarial perturbation during training has bounded $\ell_2$-norm, the classifier learned by gradient descent based adversarial training converges in direction to the maximum $\ell_2$-norm margin classifier at the rate of $\tilde{\mathcal{O}}(1/\sqrt{T})$, significantly faster than the rate $\mathcal{O}(1/\log T)$ of training with clean data.
no code implementations • ICLR Workshop DeepGenStruct 2019 • Yujia Xie, Minshuo Chen, Haoming Jiang, Tuo Zhao, Hongyuan Zha
Optimal Transport (OT) naturally arises in many machine learning applications, yet the heavy computational burden limits its wide-spread uses.
no code implementations • ICLR 2019 • Haoming Jiang, Zhehui Chen, Minshuo Chen, Feng Liu, Dingding Wang, Tuo Zhao
Generative Adversarial Networks (GANs), though powerful, is hard to train.
no code implementations • ICLR 2019 • Xingguo Li, Junwei Lu, Zhaoran Wang, Jarvis Haupt, Tuo Zhao
We propose a generalization error bound for a general family of deep neural networks based on the depth and width of the networks, as well as the spectral norm of weight matrices.
no code implementations • ICLR Workshop DeepGenStruct 2019 • Zhehui Chen, Haoming Jiang, Yuyang Shi, Bo Dai, Tuo Zhao
From the perspective of generative learning, our proposed method can be viewed as learning a deep generative model for generating adversarial samples, which is adaptive to the robust classification.
no code implementations • 28 Dec 2018 • Haoming Jiang, Zhehui Chen, Minshuo Chen, Feng Liu, Dingding Wang, Tuo Zhao
Specifically, we propose a new reparameterization approach for the weight matrices of the discriminator in GANs, which allows us to directly manipulate the spectra of the weight matrices through various regularizers and constraints, without intensively computing singular value decompositions.
no code implementations • 3 Nov 2018 • Haoming Jiang, Zhehui Chen, Yuyang Shi, Bo Dai, Tuo Zhao
Adversarial training provides a principled approach for training robust neural networks.
no code implementations • NeurIPS 2018 • Ming Yu, Zhuoran Yang, Tuo Zhao, Mladen Kolar, Zhaoran Wang
In this paper, we study the Gaussian embedding model and develop the first theoretical results for exponential family embedding models.
no code implementations • 13 Jun 2018 • Zhehui Chen, Xingguo Li, Lin F. Yang, Jarvis Haupt, Tuo Zhao
However, due to the lack of convexity, their landscape is not well understood and how to find the stable equilibria of the Lagrangian function is still unknown.
no code implementations • 13 Jun 2018 • Xingguo Li, Junwei Lu, Zhaoran Wang, Jarvis Haupt, Tuo Zhao
We establish a margin based data dependent generalization error bound for a general family of deep neural networks in terms of the depth and width, as well as the Jacobian of the networks.
no code implementations • NeurIPS 2018 • Tianyi Liu, Shiyang Li, Jianping Shi, Enlu Zhou, Tuo Zhao
Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) is one of the most popular algorithms in distributed machine learning.
no code implementations • 11 Mar 2018 • Yingxiang Yang, Adams Wei Yu, Zhaoran Wang, Tuo Zhao
We propose a nonparametric method for detecting nonlinear causal relationship within a set of multidimensional discrete time series, by using sparse additive models (SpAMs).
no code implementations • NeurIPS 2018 • Minshuo Chen, Lin Yang, Mengdi Wang, Tuo Zhao
Specifically, our goal is to estimate the principle component of time series data with respect to the covariance matrix of the stationary distribution.
no code implementations • 14 Feb 2018 • Tianyi Liu, Zhehui Chen, Enlu Zhou, Tuo Zhao
Our theoretical discovery partially corroborates the empirical success of MSGD in training deep neural networks.
no code implementations • 18 Dec 2017 • Zhuoran Yang, Lin F. Yang, Ethan X. Fang, Tuo Zhao, Zhaoran Wang, Matey Neykov
Existing nonconvex statistical optimization theory and methods crucially rely on the correct specification of the underlying "true" statistical models.
no code implementations • NeurIPS 2017 • Xingguo Li, Lin Yang, Jason Ge, Jarvis Haupt, Tong Zhang, Tuo Zhao
We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions.
no code implementations • NeurIPS 2017 • Haotian Pang, Han Liu, Robert J. Vanderbei, Tuo Zhao
High dimensional sparse learning has imposed a great computational challenge to large scale data analysis.
no code implementations • NeurIPS 2017 • Weiyang Liu, Yan-Ming Zhang, Xingguo Li, Zhiding Yu, Bo Dai, Tuo Zhao, Le Song
In light of such challenges, we propose hyperspherical convolution (SphereConv), a novel learning framework that gives angular representations on hyperspheres.
no code implementations • ICML 2017 • Zhehui Chen, Lin F. Yang, Chris Junchi Li, Tuo Zhao
Multiview representation learning is popular for latent factor analysis.
no code implementations • 19 Jun 2017 • Xingguo Li, Lin F. Yang, Jason Ge, Jarvis Haupt, Tong Zhang, Tuo Zhao
We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions.
no code implementations • 22 May 2017 • Lin F. Yang, Vladimir Braverman, Tuo Zhao, Mengdi Wang
We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm.
no code implementations • 4 Apr 2017 • Haotian Pang, Robert Vanderbei, Han Liu, Tuo Zhao
High dimensional sparse learning has imposed a great computational challenge to large scale data analysis.
no code implementations • 27 Feb 2017 • Zhehui Chen, Lin F. Yang, Chris J. Li, Tuo Zhao
Multiview representation learning is very popular for latent factor analysis.
no code implementations • 29 Dec 2016 • Xingguo Li, Junwei Lu, Raman Arora, Jarvis Haupt, Han Liu, Zhaoran Wang, Tuo Zhao
We propose a general theory for studying the \xl{landscape} of nonconvex \xl{optimization} with underlying symmetric structures \tz{for a class of machine learning problems (e. g., low-rank matrix factorization, phase retrieval, and deep linear neural networks)}.
no code implementations • NeurIPS 2018 • Lin F. Yang, R. Arora, V. Braverman, Tuo Zhao
We use differential equations based approaches to provide some {\it \textbf{physics}} insights into analyzing the dynamics of popular optimization algorithms in machine learning.
no code implementations • 10 Jul 2016 • Xingguo Li, Tuo Zhao, Raman Arora, Han Liu, Mingyi Hong
In particular, we first show that for a family of quadratic minimization problems, the iteration complexity $\mathcal{O}(\log^2(p)\cdot\log(1/\epsilon))$ of the CBCD-type methods matches that of the GD methods in term of dependency on $p$, up to a $\log^2 p$ factor.
no code implementations • NeurIPS 2016 • Davood Hajinezhad, Mingyi Hong, Tuo Zhao, Zhaoran Wang
We study a stochastic and distributed algorithm for nonconvex problems whose objective consists of a sum of $N$ nonconvex $L_i/N$-smooth functions, plus a nonsmooth regularizer.
no code implementations • 25 May 2016 • Xingguo Li, Haoming Jiang, Jarvis Haupt, Raman Arora, Han Liu, Mingyi Hong, Tuo Zhao
Many machine learning techniques sacrifice convenient computational structures to gain estimation robustness and modeling flexibility.
no code implementations • 9 May 2016 • Xingguo Li, Raman Arora, Han Liu, Jarvis Haupt, Tuo Zhao
We propose a stochastic variance reduced optimization algorithm for solving sparse learning problems with cardinality constraints.
no code implementations • NeurIPS 2015 • Tuo Zhao, Zhaoran Wang, Han Liu
We study the estimation of low rank matrices via nonconvex optimization.
no code implementations • 23 Dec 2014 • Tuo Zhao, Han Liu, Tong Zhang
This is the first result on the computational and statistical guarantees of the pathwise coordinate optimization framework in high dimensions.
no code implementations • NeurIPS 2014 • Tuo Zhao, Mo Yu, Yiming Wang, Raman Arora, Han Liu
When the regularization function is block separable, we can solve the minimization problems in a randomized block coordinate descent (RBCD) manner.
no code implementations • NeurIPS 2014 • Han Liu, Lie Wang, Tuo Zhao
We propose a new method named calibrated multivariate regression (CMR) for fitting high dimensional multivariate regression models.
no code implementations • NeurIPS 2013 • Tuo Zhao, Han Liu
We propose a semiparametric procedure for estimating high dimensional sparse inverse covariance matrix.
no code implementations • 10 May 2013 • Han Liu, Lie Wang, Tuo Zhao
We propose a calibrated multivariate regression method named CMR for fitting high dimensional multivariate regression models.