Search Results for author: Tuo Zhao

Found 128 papers, 29 papers with code

Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach

1 code implementation EMNLP 2021 Haoming Jiang, Bo Dai, Mengjiao Yang, Tuo Zhao, Wei Wei

An ideal environment for evaluating dialog systems, also known as the Turing test, needs to involve human interaction, which is usually not affordable for large-scale experiments.

Model-based Reinforcement Learning Off-policy evaluation +2

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

6 code implementations ACL 2020 Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Tuo Zhao

However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model.

Linguistic Acceptability Natural Language Inference +4

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

2 code implementations18 Mar 2023 Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, Tuo Zhao

Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e. g., low-rank increments.

Question Answering Text Generation

Transformer Hawkes Process

3 code implementations ICML 2020 Simiao Zuo, Haoming Jiang, Zichong Li, Tuo Zhao, Hongyuan Zha

Modern data acquisition routinely produce massive amounts of event sequence data in various domains, such as social media, healthcare, and financial markets.

Computational Efficiency Point Processes

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

1 code implementation12 Oct 2023 Yixiao Li, Yifan Yu, Chen Liang, Pengcheng He, Nikos Karampatziakis, Weizhu Chen, Tuo Zhao

Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning.

Natural Language Understanding Quantization +2

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data

1 code implementation ACL 2021 Haoming Jiang, Danqing Zhang, Tianyu Cao, Bing Yin, Tuo Zhao

Unfortunately, we observe that weakly labeled data does not necessarily improve, or even deteriorate the model performance (due to the extensive noise in the weak labels) when we train deep NER models over a simple or weighted combination of the strongly labeled and weakly labeled data.

named-entity-recognition Named Entity Recognition +1

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

1 code implementation3 Nov 2023 Qingru Zhang, Chandan Singh, Liyuan Liu, Xiaodong Liu, Bin Yu, Jianfeng Gao, Tuo Zhao

In human-written articles, we often leverage the subtleties of text style, such as bold and italics, to guide the attention of readers.

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

1 code implementation27 Jun 2020 Jason Ge, Xingguo Li, Haoming Jiang, Han Liu, Tong Zhang, Mengdi Wang, Tuo Zhao

We describe a new library named picasso, which implements a unified framework of pathwise coordinate optimization for a variety of sparse learning problems (e. g., sparse linear regression, sparse logistic regression, sparse Poisson regression and scaled sparse linear regression) combined with efficient active set selection strategies.

regression Sparse Learning

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM

1 code implementation8 Mar 2024 Hao Kang, Qingru Zhang, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tushar Krishna, Tuo Zhao

Key-value (KV) caching has become the de-facto to accelerate generation speed for large language models (LLMs) inference.

Quantization

Taming Sparsely Activated Transformer with Stochastic Experts

1 code implementation ICLR 2022 Simiao Zuo, Xiaodong Liu, Jian Jiao, Young Jin Kim, Hany Hassan, Ruofei Zhang, Tuo Zhao, Jianfeng Gao

While most on-going research focuses on improving SAMs models by exploring methods of routing inputs to experts, our analysis reveals that such research might not lead to the solution we expect, i. e., the commonly-used routing methods based on gating mechanisms do not work better than randomly routing inputs to experts.

Machine Translation Translation

Efficient Long Sequence Modeling via State Space Augmented Transformer

1 code implementation15 Dec 2022 Simiao Zuo, Xiaodong Liu, Jian Jiao, Denis Charles, Eren Manavoglu, Tuo Zhao, Jianfeng Gao

Specifically, we augment a SSM into the bottom layer of SPADE, and we employ efficient local attention methods for the other layers.

Computational Efficiency Language Modelling +2

Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data

1 code implementation EMNLP 2020 Lingkai Kong, Haoming Jiang, Yuchen Zhuang, Jie Lyu, Tuo Zhao, Chao Zhang

Fine-tuned pre-trained language models can suffer from severe miscalibration for both in-distribution and out-of-distribution (OOD) data due to over-parameterization.

Language Modelling Out of Distribution (OOD) Detection +2

Less is More: Task-aware Layer-wise Distillation for Language Model Compression

1 code implementation4 Oct 2022 Chen Liang, Simiao Zuo, Qingru Zhang, Pengcheng He, Weizhu Chen, Tuo Zhao

As such, TED reduces the knowledge gap between the two models and helps the student to fit better on the target task.

Language Modelling Model Compression

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization

1 code implementation ACL 2021 Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu, Pengcheng He, Tuo Zhao, Weizhu Chen

The Lottery Ticket Hypothesis suggests that an over-parametrized network consists of ``lottery tickets'', and training a certain collection of them (i. e., a subnetwork) can match the performance of the full model.

Model Compression Multi-Task Learning

CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing

1 code implementation ACL 2022 Chen Liang, Pengcheng He, Yelong Shen, Weizhu Chen, Tuo Zhao

To retain ensemble benefits while maintaining a low memory cost, we propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO.

Ensemble Learning

Meta Learning with Relational Information for Short Sequences

1 code implementation NeurIPS 2019 Yujia Xie, Haoming Jiang, Feng Liu, Tuo Zhao, Hongyuan Zha

This paper proposes a new meta-learning method -- named HARMLESS (HAwkes Relational Meta LEarning method for Short Sequences) for learning heterogeneous point process models from short event sequence data along with a relational network.

Meta-Learning

Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL

1 code implementation NeurIPS 2021 Minshuo Chen, Yan Li, Ethan Wang, Zhuoran Yang, Zhaoran Wang, Tuo Zhao

Theoretically, under a weak coverage assumption that the experience dataset contains enough information about the optimal policy, we prove that for an episodic mean-field MDP with a horizon $H$ and $N$ training trajectories, SAFARI attains a sub-optimality gap of $\mathcal{O}(H^2d_{\rm eff} /\sqrt{N})$, where $d_{\rm eff}$ is the effective dimension of the function class for parameterizing the value function, but independent on the number of agents.

Multi-agent Reinforcement Learning

Deep Reinforcement Learning with Hierarchical Reward Modeling

1 code implementation6 Sep 2023 Alexander Bukharin, Yixiao Li, Pengcheng He, Weizhu Chen, Tuo Zhao

Researchers typically utilize feedback signals from the environment to handcraft a reward function, but this process is not always effective due to the varying scale and intricate dependencies of the feedback signals.

reinforcement-learning Reinforcement Learning (RL)

Machine Learning Force Fields with Data Cost Aware Training

1 code implementation5 Jun 2023 Alexander Bukharin, Tianyi Liu, Shengjie Wang, Simiao Zuo, Weihao Gao, Wen Yan, Tuo Zhao

To address this issue, we propose a multi-stage computational framework -- ASTEROID, which lowers the data cost of MLFFs by leveraging a combination of cheap inaccurate data and expensive accurate data.

To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO

1 code implementation6 Apr 2024 Zi-Hao Qiu, Siqi Guo, Mao Xu, Tuo Zhao, Lijun Zhang, Tianbao Yang

In this paper, we present a principled framework for learning a small yet generalizable temperature prediction network (TempNet) to improve LFMs.

On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond

no code implementations13 Jun 2018 Xingguo Li, Junwei Lu, Zhaoran Wang, Jarvis Haupt, Tuo Zhao

We establish a margin based data dependent generalization error bound for a general family of deep neural networks in terms of the depth and width, as well as the Jacobian of the networks.

Generalization Bounds

On Landscape of Lagrangian Functions and Stochastic Search for Constrained Nonconvex Optimization

no code implementations13 Jun 2018 Zhehui Chen, Xingguo Li, Lin F. Yang, Jarvis Haupt, Tuo Zhao

However, due to the lack of convexity, their landscape is not well understood and how to find the stable equilibria of the Lagrangian function is still unknown.

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

no code implementations NeurIPS 2018 Tianyi Liu, Shiyang Li, Jianping Shi, Enlu Zhou, Tuo Zhao

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) is one of the most popular algorithms in distributed machine learning.

Stochastic Optimization

Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization

no code implementations NeurIPS 2018 Minshuo Chen, Lin Yang, Mengdi Wang, Tuo Zhao

Specifically, our goal is to estimate the principle component of time series data with respect to the covariance matrix of the stationary distribution.

Dimensionality Reduction Stochastic Optimization +2

Detecting Nonlinear Causality in Multivariate Time Series with Sparse Additive Models

no code implementations11 Mar 2018 Yingxiang Yang, Adams Wei Yu, Zhaoran Wang, Tuo Zhao

We propose a nonparametric method for detecting nonlinear causal relationship within a set of multidimensional discrete time series, by using sparse additive models (SpAMs).

Additive models Model Selection +2

On Quadratic Convergence of DC Proximal Newton Algorithm for Nonconvex Sparse Learning in High Dimensions

no code implementations19 Jun 2017 Xingguo Li, Lin F. Yang, Jason Ge, Jarvis Haupt, Tong Zhang, Tuo Zhao

We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions.

Sparse Learning

A Diffusion Approximation Theory of Momentum SGD in Nonconvex Optimization

no code implementations14 Feb 2018 Tianyi Liu, Zhehui Chen, Enlu Zhou, Tuo Zhao

Our theoretical discovery partially corroborates the empirical success of MSGD in training deep neural networks.

Bayesian Inference Dimensionality Reduction +1

On Fast Convergence of Proximal Algorithms for SQRT-Lasso Optimization: Don't Worry About Its Nonsmooth Loss Function

no code implementations25 May 2016 Xingguo Li, Haoming Jiang, Jarvis Haupt, Raman Arora, Han Liu, Mingyi Hong, Tuo Zhao

Many machine learning techniques sacrifice convenient computational structures to gain estimation robustness and modeling flexibility.

regression

Deep Hyperspherical Learning

no code implementations NeurIPS 2017 Weiyang Liu, Yan-Ming Zhang, Xingguo Li, Zhiding Yu, Bo Dai, Tuo Zhao, Le Song

In light of such challenges, we propose hyperspherical convolution (SphereConv), a novel learning framework that gives angular representations on hyperspheres.

Representation Learning

Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization

no code implementations29 Dec 2016 Xingguo Li, Junwei Lu, Raman Arora, Jarvis Haupt, Han Liu, Zhaoran Wang, Tuo Zhao

We propose a general theory for studying the \xl{landscape} of nonconvex \xl{optimization} with underlying symmetric structures \tz{for a class of machine learning problems (e. g., low-rank matrix factorization, phase retrieval, and deep linear neural networks)}.

Retrieval

Nonconvex Sparse Learning via Stochastic Optimization with Progressive Variance Reduction

no code implementations9 May 2016 Xingguo Li, Raman Arora, Han Liu, Jarvis Haupt, Tuo Zhao

We propose a stochastic variance reduced optimization algorithm for solving sparse learning problems with cardinality constraints.

Sparse Learning Stochastic Optimization

Misspecified Nonconvex Statistical Optimization for Phase Retrieval

no code implementations18 Dec 2017 Zhuoran Yang, Lin F. Yang, Ethan X. Fang, Tuo Zhao, Zhaoran Wang, Matey Neykov

Existing nonconvex statistical optimization theory and methods crucially rely on the correct specification of the underlying "true" statistical models.

Retrieval

Online Factorization and Partition of Complex Networks From Random Walks

no code implementations22 May 2017 Lin F. Yang, Vladimir Braverman, Tuo Zhao, Mengdi Wang

We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm.

Clustering

Homotopy Parametric Simplex Method for Sparse Learning

no code implementations4 Apr 2017 Haotian Pang, Robert Vanderbei, Han Liu, Tuo Zhao

High dimensional sparse learning has imposed a great computational challenge to large scale data analysis.

regression Sparse Learning

On Faster Convergence of Cyclic Block Coordinate Descent-type Methods for Strongly Convex Minimization

no code implementations10 Jul 2016 Xingguo Li, Tuo Zhao, Raman Arora, Han Liu, Mingyi Hong

In particular, we first show that for a family of quadratic minimization problems, the iteration complexity $\mathcal{O}(\log^2(p)\cdot\log(1/\epsilon))$ of the CBCD-type methods matches that of the GD methods in term of dependency on $p$, up to a $\log^2 p$ factor.

regression

The Physical Systems Behind Optimization Algorithms

no code implementations NeurIPS 2018 Lin F. Yang, R. Arora, V. Braverman, Tuo Zhao

We use differential equations based approaches to provide some {\it \textbf{physics}} insights into analyzing the dynamics of popular optimization algorithms in machine learning.

BIG-bench Machine Learning

Pathwise Coordinate Optimization for Sparse Learning: Algorithm and Theory

no code implementations23 Dec 2014 Tuo Zhao, Han Liu, Tong Zhang

This is the first result on the computational and statistical guarantees of the pathwise coordinate optimization framework in high dimensions.

Sparse Learning

NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization

no code implementations NeurIPS 2016 Davood Hajinezhad, Mingyi Hong, Tuo Zhao, Zhaoran Wang

We study a stochastic and distributed algorithm for nonconvex problems whose objective consists of a sum of $N$ nonconvex $L_i/N$-smooth functions, plus a nonsmooth regularizer.

Stochastic Optimization

Calibrated Multivariate Regression with Application to Neural Semantic Basis Discovery

no code implementations10 May 2013 Han Liu, Lie Wang, Tuo Zhao

We propose a calibrated multivariate regression method named CMR for fitting high dimensional multivariate regression models.

Activity Prediction regression

Provable Gaussian Embedding with One Observation

no code implementations NeurIPS 2018 Ming Yu, Zhuoran Yang, Tuo Zhao, Mladen Kolar, Zhaoran Wang

In this paper, we study the Gaussian embedding model and develop the first theoretical results for exponential family embedding models.

BIG-bench Machine Learning

Learning to Defend by Learning to Attack

no code implementations3 Nov 2018 Haoming Jiang, Zhehui Chen, Yuyang Shi, Bo Dai, Tuo Zhao

Adversarial training provides a principled approach for training robust neural networks.

Adversarial Attack Adversarial Defense +3

Parametric Simplex Method for Sparse Learning

no code implementations NeurIPS 2017 Haotian Pang, Han Liu, Robert J. Vanderbei, Tuo Zhao

High dimensional sparse learning has imposed a great computational challenge to large scale data analysis.

Sparse Learning

Multivariate Regression with Calibration

no code implementations NeurIPS 2014 Han Liu, Lie Wang, Tuo Zhao

We propose a new method named calibrated multivariate regression (CMR) for fitting high dimensional multivariate regression models.

Activity Prediction regression

Accelerated Mini-batch Randomized Block Coordinate Descent Method

no code implementations NeurIPS 2014 Tuo Zhao, Mo Yu, Yiming Wang, Raman Arora, Han Liu

When the regularization function is block separable, we can solve the minimization problems in a randomized block coordinate descent (RBCD) manner.

Sparse Learning Stochastic Optimization

Sparse Inverse Covariance Estimation with Calibration

no code implementations NeurIPS 2013 Tuo Zhao, Han Liu

We propose a semiparametric procedure for estimating high dimensional sparse inverse covariance matrix.

Model Selection

On Tighter Generalization Bounds for Deep Neural Networks: CNNs, ResNets, and Beyond

no code implementations ICLR 2019 Xingguo Li, Junwei Lu, Zhaoran Wang, Jarvis Haupt, Tuo Zhao

We propose a generalization error bound for a general family of deep neural networks based on the depth and width of the networks, as well as the spectral norm of weight matrices.

Generalization Bounds

On Computation and Generalization of GANs with Spectrum Control

no code implementations28 Dec 2018 Haoming Jiang, Zhehui Chen, Minshuo Chen, Feng Liu, Dingding Wang, Tuo Zhao

Specifically, we propose a new reparameterization approach for the weight matrices of the discriminator in GANs, which allows us to directly manipulate the spectra of the weight matrices through various regularizers and constraints, without intensively computing singular value decompositions.

On Scalable and Efficient Computation of Large Scale Optimal Transport

no code implementations ICLR Workshop DeepGenStruct 2019 Yujia Xie, Minshuo Chen, Haoming Jiang, Tuo Zhao, Hongyuan Zha

Optimal Transport (OT) naturally arises in many machine learning applications, yet the heavy computational burden limits its wide-spread uses.

Domain Adaptation

Inductive Bias of Gradient Descent based Adversarial Training on Separable Data

no code implementations7 Jun 2019 Yan Li, Ethan X. Fang, Huan Xu, Tuo Zhao

Specifically, we show that when the adversarial perturbation during training has bounded $\ell_2$-norm, the classifier learned by gradient descent based adversarial training converges in direction to the maximum $\ell_2$-norm margin classifier at the rate of $\tilde{\mathcal{O}}(1/\sqrt{T})$, significantly faster than the rate $\mathcal{O}(1/\log T)$ of training with clean data.

Binary Classification Inductive Bias

Nonparametric Regression on Low-Dimensional Manifolds using Deep ReLU Networks : Function Approximation and Statistical Recovery

no code implementations NeurIPS 2019 Minshuo Chen, Haoming Jiang, Wenjing Liao, Tuo Zhao

It therefore demonstrates the adaptivity of deep ReLU networks to low-dimensional geometric structures of data, and partially explains the power of deep ReLU networks in tackling high-dimensional data with low-dimensional geometric structures.

regression

Towards Understanding the Importance of Noise in Training Neural Networks

no code implementations7 Sep 2019 Mo Zhou, Tianyi Liu, Yan Li, Dachao Lin, Enlu Zhou, Tuo Zhao

Numerous empirical evidence has corroborated that the noise plays a crucial rule in effective and efficient training of neural networks.

Towards Understanding the Importance of Shortcut Connections in Residual Networks

no code implementations NeurIPS 2019 Tianyi Liu, Minshuo Chen, Mo Zhou, Simon S. Du, Enlu Zhou, Tuo Zhao

We show, however, that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialized arbitrarily in a ball.

On Generalization Bounds of a Family of Recurrent Neural Networks

no code implementations ICLR 2019 Minshuo Chen, Xingguo Li, Tuo Zhao

We remark: (1) Our generalization bound for vanilla RNNs is significantly tighter than the best of existing results; (2) We are not aware of any other generalization bounds for MGU, LSTM, and Conv RNNs in the exiting literature; (3) We demonstrate the advantages of these variants in generalization.

Generalization Bounds PAC learning

Multi-Domain Neural Machine Translation with Word-Level Adaptive Layer-wise Domain Mixing

1 code implementation ACL 2020 Haoming Jiang, Chen Liang, Chong Wang, Tuo Zhao

To overcome this limitation, we propose a novel multi-domain NMT model using individual modules for each domain, on which we apply word-level, adaptive and layer-wise domain mixing.

Machine Translation NMT +3

Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds

no code implementations NeurIPS 2019 Minshuo Chen, Haoming Jiang, Wenjing Liao, Tuo Zhao

The network size scales exponentially in the approximation error, with an exponent depending on the intrinsic dimension of the data and the smoothness of the function.

Distribution Approximation and Statistical Estimation Guarantees of Generative Adversarial Networks

no code implementations10 Feb 2020 Minshuo Chen, Wenjing Liao, Hongyuan Zha, Tuo Zhao

Generative Adversarial Networks (GANs) have achieved a great success in unsupervised learning.

Differentiable Top-k Operator with Optimal Transport

no code implementations16 Feb 2020 Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister

The top-k operation, i. e., finding the k largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining.

Information Retrieval Retrieval

Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? -- A Neural Tangent Kernel Perspective

no code implementations14 Feb 2020 Kaixuan Huang, Yuqing Wang, Molei Tao, Tuo Zhao

We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity.

Towards Understanding Hierarchical Learning: Benefits of Neural Representations

no code implementations NeurIPS 2020 Minshuo Chen, Yu Bai, Jason D. Lee, Tuo Zhao, Huan Wang, Caiming Xiong, Richard Socher

When the trainable network is the quadratic Taylor model of a wide two-layer network, we show that neural representation can achieve improved sample complexities compared with the raw input: For learning a low-rank degree-$p$ polynomial ($p \geq 4$) in $d$ dimension, neural representation requires only $\tilde{O}(d^{\lceil p/2 \rceil})$ samples, while the best-known sample complexity upper bound for the raw input is $\tilde{O}(d^{p-1})$.

The huge Package for High-dimensional Undirected Graph Estimation in R

no code implementations26 Jun 2020 Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty, Larry Wasserman

We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data.

Model Selection Vocal Bursts Intensity Prediction

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R

no code implementations27 Jun 2020 Xingguo Li, Tuo Zhao, Xiaoming Yuan, Han Liu

This paper describes an R package named flare, which implements a family of new high dimensional regression methods (LAD Lasso, SQRT Lasso, $\ell_q$ Lasso, and Dantzig selector) and their extensions to sparse precision matrix estimation (TIGER and CLIME).

regression

Residual Network Based Direct Synthesis of EM Structures: A Study on One-to-One Transformers

no code implementations25 Aug 2020 David Munzer, Siawpeng Er, Minshuo Chen, Yan Li, Naga S. Mannem, Tuo Zhao, Hua Wang

We propose using machine learning models for the direct synthesis of on-chip electromagnetic (EM) passive structures to enable rapid or even automated designs and optimizations of RF/mm-Wave circuits.

BIG-bench Machine Learning

Implicit Bias of Gradient Descent based Adversarial Training on Separable Data

no code implementations ICLR 2020 Yan Li, Ethan X. Fang, Huan Xu, Tuo Zhao

Specifically, we show that for any fixed iteration $T$, when the adversarial perturbation during training has proper bounded L2 norm, the classifier learned by gradient descent based adversarial training converges in direction to the maximum L2 norm margin classifier at the rate of $O(1/\sqrt{T})$, significantly faster than the rate $O(1/\log T}$ of training with clean data.

Binary Classification

Deep Reinforcement Learning with Smooth Policy

no code implementations ICML 2020 Qianli Shen, Yan Li, Haoming Jiang, Zhaoran Wang, Tuo Zhao

In contrast to policy parameterized by linear/reproducing kernel functions, where simple regularization techniques suffice to control smoothness, for neural network based reinforcement learning algorithms, there is no readily available solution to learn a smooth policy.

reinforcement-learning Reinforcement Learning (RL)

How Important is the Train-Validation Split in Meta-Learning?

no code implementations12 Oct 2020 Yu Bai, Minshuo Chen, Pan Zhou, Tuo Zhao, Jason D. Lee, Sham Kakade, Huan Wang, Caiming Xiong

A common practice in meta-learning is to perform a train-validation split (\emph{train-val method}) where the prior adapts to the task on one split of the data, and the resulting predictor is evaluated on another split.

Meta-Learning

Doubly Robust Off-Policy Learning on Low-Dimensional Manifolds by Deep Neural Networks

no code implementations3 Nov 2020 Minshuo Chen, Hao liu, Wenjing Liao, Tuo Zhao

Our theory shows that deep neural networks are adaptive to the low-dimensional geometric structures of the covariates, and partially explains the success of deep learning for causal inference.

Causal Inference

Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? --- A Neural Tangent Kernel Perspective

no code implementations NeurIPS 2020 Kaixuan Huang, Yuqing Wang, Molei Tao, Tuo Zhao

We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity.

Differentiable Top-k with Optimal Transport

no code implementations NeurIPS 2020 Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister

Finding the k largest or smallest elements from a collection of scores, i. e., top-k operation, is an important model component widely used in information retrieval, machine learning, and data mining.

Information Retrieval Retrieval

A Hypergradient Approach to Robust Regression without Correspondence

no code implementations ICLR 2021 Yujia Xie, Yixiu Mao, Simiao Zuo, Hongteng Xu, Xiaojing Ye, Tuo Zhao, Hongyuan Zha

Due to the combinatorial nature of the problem, most existing methods are only applicable when the sample size is small, and limited to linear regression models.

Multi-Object Tracking regression

Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

no code implementations24 Feb 2021 Tianyi Liu, Yan Li, Song Wei, Enlu Zhou, Tuo Zhao

Numerous empirical evidences have corroborated the importance of noise in nonconvex optimization problems.

Reinforcement Learning for Adaptive Mesh Refinement

no code implementations1 Mar 2021 Jiachen Yang, Tarik Dzanic, Brenden Petersen, Jun Kudo, Ketan Mittal, Vladimir Tomov, Jean-Sylvain Camier, Tuo Zhao, Hongyuan Zha, Tzanio Kolev, Robert Anderson, Daniel Faissol

Large-scale finite element simulations of complex physical systems governed by partial differential equations (PDE) crucially depend on adaptive mesh refinement (AMR) to allocate computational budget to regions where higher resolution is required.

Inductive Bias reinforcement-learning +1

Token-wise Curriculum Learning for Neural Machine Translation

no code implementations Findings (EMNLP) 2021 Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Tuo Zhao

Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage.

Machine Translation NMT +2

COUnty aggRegation mixup AuGmEntation (COURAGE) COVID-19 Prediction

no code implementations3 May 2021 Siawpeng Er, Shihao Yang, Tuo Zhao

The global spread of COVID-19, the disease caused by the novel coronavirus SARS-CoV-2, has cast a significant threat to mankind.

Computational Efficiency Time Series +1

Permutation Invariant Policy Optimization for Mean-Field Multi-Agent Reinforcement Learning: A Principled Approach

no code implementations18 May 2021 Yan Li, Lingxiao Wang, Jiachen Yang, Ethan Wang, Zhaoran Wang, Tuo Zhao, Hongyuan Zha

To exploit the permutation invariance therein, we propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.

Inductive Bias Multi-agent Reinforcement Learning

Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data

no code implementations15 Aug 2021 Yan Li, Caleb Ju, Ethan X. Fang, Tuo Zhao

For any BPPA instantiated with a fixed Bregman divergence, we provide a lower bound of the margin obtained by BPPA with respect to an arbitrarily chosen norm.

QUEACO: Borrowing Treasures from Weakly-labeled Behavior Data for Query Attribute Value Extraction

no code implementations19 Aug 2021 Danqing Zhang, Zheng Li, Tianyu Cao, Chen Luo, Tony Wu, Hanqing Lu, Yiwei Song, Bing Yin, Tuo Zhao, Qiang Yang

We study the problem of query attribute value extraction, which aims to identify named entities from user queries as diverse surface form attribute values and afterward transform them into formally canonical forms.

Attribute Attribute Value Extraction +3

Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks

no code implementations7 Sep 2021 Hao liu, Minshuo Chen, Tuo Zhao, Wenjing Liao

Most of existing statistical theories on deep neural networks have sample complexities cursed by the data dimension and therefore cannot well explain the empirical success of deep learning on high-dimensional data.

Binary Classification

Self-Training with Differentiable Teacher

no code implementations Findings (NAACL) 2022 Simiao Zuo, Yue Yu, Chen Liang, Haoming Jiang, Siawpeng Er, Chao Zhang, Tuo Zhao, Hongyuan Zha

In self-training, the student contributes to the prediction performance, and the teacher controls the training process by generating pseudo-labels.

named-entity-recognition Named Entity Recognition +3

Adversarially Regularized Policy Learning Guided by Trajectory Optimization

no code implementations16 Sep 2021 Zhigen Zhao, Simiao Zuo, Tuo Zhao, Ye Zhao

Recent advancement in combining trajectory optimization with function approximation (especially neural networks) shows promise in learning complex control policies for diverse tasks in robot systems.

Robot Manipulation

Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect

no code implementations ICLR 2022 Yuqing Wang, Minshuo Chen, Tuo Zhao, Molei Tao

Moreover, we rigorously establish an implicit bias of GD induced by such a large learning rate, termed 'balancing', meaning that magnitudes of $X$ and $Y$ at the limit of GD iterations will be close even if their initialization is significantly unbalanced.

A Principled Permutation Invariant Approach to Mean-Field Multi-Agent Reinforcement Learning

no code implementations29 Sep 2021 Yan Li, Lingxiao Wang, Jiachen Yang, Ethan Wang, Zhaoran Wang, Tuo Zhao, Hongyuan Zha

To exploit the permutation invariance therein, we propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation- invariant actor-critic neural architecture.

Inductive Bias Multi-agent Reinforcement Learning +2

Frequency-aware SGD for Efficient Embedding Learning with Provable Benefits

no code implementations ICLR 2022 Yan Li, Dhruv Choudhary, Xiaohan Wei, Baichuan Yuan, Bhargav Bhushanam, Tuo Zhao, Guanghui Lan

We show that incorporating frequency information of tokens in the embedding learning problems leads to provably efficient algorithms, and demonstrate that common adaptive algorithms implicitly exploit the frequency information to a large extent.

Language Modelling Recommendation Systems

Learning to Defense by Learning to Attack

no code implementations ICLR Workshop DeepGenStruct 2019 Zhehui Chen, Haoming Jiang, Yuyang Shi, Bo Dai, Tuo Zhao

From the perspective of generative learning, our proposed method can be viewed as learning a deep generative model for generating adversarial samples, which is adaptive to the robust classification.

Adversarial Attack Robust classification

Differentiable Top-$k$ with Optimal Transport

no code implementations NeurIPS Workshop LMCA 2020 Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister

The top-$k$ operation, i. e., finding the $k$ largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining.

Information Retrieval Retrieval

Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces

no code implementations1 Jan 2022 Hao liu, Haizhao Yang, Minshuo Chen, Tuo Zhao, Wenjing Liao

Learning operators between infinitely dimensional spaces is an important learning task arising in wide applications in machine learning, imaging science, mathematical modeling and simulations, etc.

Deep Learning Assisted End-to-End Synthesis of mm-Wave Passive Networks with 3D EM Structures: A Study on A Transformer-Based Matching Network

no code implementations6 Jan 2022 Siawpeng Er, Edward Liu, Minshuo Chen, Yan Li, Yuqi Liu, Tuo Zhao, Hua Wang

This paper presents a deep learning assisted synthesis approach for direct end-to-end generation of RF/mm-wave passive matching network with 3D EM structures.

Block Policy Mirror Descent

no code implementations15 Jan 2022 Guanghui Lan, Yan Li, Tuo Zhao

Despite the nonconvex nature of the problem and a partial update rule, we provide a unified analysis for several sampling schemes, and show that BPMD achieves fast linear convergence to the global optimality.

reinforcement-learning Reinforcement Learning (RL)

Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity

no code implementations24 Jan 2022 Yan Li, Guanghui Lan, Tuo Zhao

We first establish the global linear convergence of HPMD instantiated with Kullback-Leibler divergence, for both the optimality gap, and a weighted distance to the set of optimal policies.

Policy Gradient Methods

Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably

no code implementations7 Feb 2022 Tianyi Liu, Yan Li, Enlu Zhou, Tuo Zhao

We investigate the role of noise in optimization algorithms for learning over-parameterized models.

A Manifold Two-Sample Test Study: Integral Probability Metric with Neural Networks

no code implementations4 May 2022 Jie Wang, Minshuo Chen, Tuo Zhao, Wenjing Liao, Yao Xie

Based on the approximation theory of neural networks, we show that the neural network IPM test has the type-II risk in the order of $n^{-(s+\beta)/d}$, which is in the same order of the type-II risk as the H\"older IPM test.

Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks

no code implementations6 Jun 2022 Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao

We consider the off-policy evaluation problem of reinforcement learning using deep convolutional neural networks.

Off-policy evaluation

Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint

no code implementations9 Jun 2022 Hao liu, Minshuo Chen, Siawpeng Er, Wenjing Liao, Tong Zhang, Tuo Zhao

Overparameterized neural networks enjoy great representation power on complex data, and more importantly yield sufficiently smooth output, which is crucial to their generalization and robustness.

Image Classification

DiP-GNN: Discriminative Pre-Training of Graph Neural Networks

no code implementations15 Sep 2022 Simiao Zuo, Haoming Jiang, Qingyu Yin, Xianfeng Tang, Bing Yin, Tuo Zhao

Specifically, we train a generator to recover identities of the masked edges, and simultaneously, we train a discriminator to distinguish the generated edges from the original graph's edges.

Node Classification

Differentially Private Estimation of Hawkes Process

no code implementations15 Sep 2022 Simiao Zuo, Tianyi Liu, Tuo Zhao, Hongyuan Zha

Point process models are of great importance in real world applications.

Context-Aware Query Rewriting for Improving Users' Search Experience on E-commerce Websites

no code implementations15 Sep 2022 Simiao Zuo, Qingyu Yin, Haoming Jiang, Shaohui Xi, Bing Yin, Chao Zhang, Tuo Zhao

The model subsequently calculates session representations by combining the contextual information with the instant search query using an aggregation network.

Graph Attention

First-order Policy Optimization for Robust Markov Decision Process

no code implementations21 Sep 2022 Yan Li, Guanghui Lan, Tuo Zhao

We consider the problem of solving robust Markov decision process (MDP), which involves a set of discounted, finite state, finite action space MDPs with uncertain transition kernels.

Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data

no code implementations14 Feb 2023 Minshuo Chen, Kaixuan Huang, Tuo Zhao, Mengdi Wang

Furthermore, the generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.

HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers

no code implementations19 Feb 2023 Chen Liang, Haoming Jiang, Zheng Li, Xianfeng Tang, Bin Yin, Tuo Zhao

Since the teacher model has a significantly larger capacity and stronger representation power than the student model, it is very difficult for the student to produce predictions that match the teacher's over a massive amount of open-domain training data.

Knowledge Distillation Model Compression +1

On Deep Generative Models for Approximation and Estimation of Distributions on Manifolds

no code implementations25 Feb 2023 Biraj Dahal, Alex Havrilla, Minshuo Chen, Tuo Zhao, Wenjing Liao

Many existing experiments have demonstrated that generative networks can generate high-dimensional complex data from a low-dimensional easy-to-sample distribution.

Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories

no code implementations26 Jun 2023 Zixuan Zhang, Minshuo Chen, Mengdi Wang, Wenjing Liao, Tuo Zhao

Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to the intrinsic data structures.

regression

Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks

no code implementations4 Jul 2023 Kaiqi Zhang, Zixuan Zhang, Minshuo Chen, Yuma Takeda, Mengdi Wang, Tuo Zhao, Yu-Xiang Wang

Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkable prediction performance in practice, which cannot be well explained by conventional wisdom.

Pivotal Estimation of Linear Discriminant Analysis in High Dimensions

no code implementations18 Sep 2023 Ethan X. Fang, Yajun Mei, Yuyang Shi, Qunzhi Xu, Tuo Zhao

We consider the linear discriminant analysis problem in the high-dimensional settings.

Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds

no code implementations25 Sep 2023 Zhenghao Xu, Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao

As a result, by properly choosing the network size and hyperparameters, NPMD can find an $\epsilon$-optimal policy with $\widetilde{O}(\epsilon^{-\frac{d}{\alpha}-2})$ samples in expectation, where $\alpha\in(0, 1]$ indicates the smoothness of environment.

Policy Gradient Methods Reinforcement Learning (RL)

Score Matching-based Pseudolikelihood Estimation of Neural Marked Spatio-Temporal Point Process with Uncertainty Quantification

no code implementations25 Oct 2023 Zichong Li, Qunzhi Xu, Zhenghao Xu, Yajun Mei, Tuo Zhao, Hongyuan Zha

Specifically, our framework adopts a normalization-free objective by estimating the pseudolikelihood of marked STPPs through score-matching and offers uncertainty quantification for the predicted event time, location and mark by computing confidence regions over the generated samples.

Point Processes Uncertainty Quantification

Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult

no code implementations26 Oct 2023 Yuqing Wang, Zhenghao Xu, Tuo Zhao, Molei Tao

This regularity, together with gradient descent using a large learning rate that favors flatter regions, results in these nontrivial dynamical behaviors.

Data Diversity Matters for Robust Instruction Tuning

no code implementations21 Nov 2023 Alexander Bukharin, Tuo Zhao

QDIT provides a simple method to simultaneously control dataset diversity and quality, allowing us to conduct an in-depth study on the effect of diversity and quality on instruction tuning performance.

Instruction Following

BlendFilter: Advancing Retrieval-Augmented Large Language Models via Query Generation Blending and Knowledge Filtering

no code implementations16 Feb 2024 Haoyu Wang, Tuo Zhao, Jing Gao

Retrieval-augmented Large Language Models (LLMs) offer substantial benefits in enhancing performance across knowledge-intensive scenarios.

Open-Domain Question Answering Retrieval

Stochastic Constrained Decentralized Optimization for Machine Learning with Fewer Data Oracles: a Gradient Sliding Approach

no code implementations3 Apr 2024 Hoang Huy Nguyen, Yan Li, Tuo Zhao

In modern decentralized applications, ensuring communication efficiency and privacy for the users are the key challenges.

Cannot find the paper you are looking for? You can Submit a new open access paper.