Search Results for author: Tuo Zhao

Found 128 papers, 29 papers with code

Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach

1 code implementation • EMNLP 2021 • Haoming Jiang, Bo Dai, Mengjiao Yang, Tuo Zhao, Wei Wei

An ideal environment for evaluating dialog systems, also known as the Turing test, needs to involve human interaction, which is usually not affordable for large-scale experiments.

Model-based Reinforcement Learning Off-policy evaluation +2

32,717

Paper
Code

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

6 code implementations • ACL 2020 • Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Tuo Zhao

However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model.

Ranked #1 on Natural Language Inference on QNLI

Linguistic Acceptability Natural Language Inference +4

2,198

Paper
Code

BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision

1 code implementation • 28 Jun 2020 • Chen Liang, Yue Yu, Haoming Jiang, Siawpeng Er, Ruijia Wang, Tuo Zhao, Chao Zhang

We study the open-domain named entity recognition (NER) problem under distant supervision.

Ranked #1 on Weakly-Supervised Named Entity Recognition on CoNLL03

Language Modelling named-entity-recognition +3

290

Paper
Code

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

2 code implementations • 18 Mar 2023 • Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, Tuo Zhao

Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e. g., low-rank increments.

Question Answering Text Generation

202

Paper
Code

Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach

1 code implementation • NAACL 2021 • Yue Yu, Simiao Zuo, Haoming Jiang, Wendi Ren, Tuo Zhao, Chao Zhang

To address this problem, we develop a contrastive self-training framework, COSINE, to enable fine-tuning LMs with weak supervision.

Ranked #1 on Word Sense Disambiguation on Words in Context

Language Modelling Sentence +2

198

Paper
Code

Transformer Hawkes Process

3 code implementations • ICML 2020 • Simiao Zuo, Haoming Jiang, Zichong Li, Tuo Zhao, Hongyuan Zha

Modern data acquisition routinely produce massive amounts of event sequence data in various domains, such as social media, healthcare, and financial markets.

Computational Efficiency Point Processes

158

Paper
Code

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

1 code implementation • 12 Oct 2023 • Yixiao Li, Yifan Yu, Chen Liang, Pengcheng He, Nikos Karampatziakis, Weizhu Chen, Tuo Zhao

Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning.

Natural Language Understanding Quantization +2

152

Paper
Code

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data

1 code implementation • ACL 2021 • Haoming Jiang, Danqing Zhang, Tianyu Cao, Bing Yin, Tuo Zhao

Unfortunately, we observe that weakly labeled data does not necessarily improve, or even deteriorate the model performance (due to the extensive noise in the weak labels) when we train deep NER models over a simple or weighted combination of the strongly labeled and weakly labeled data.

named-entity-recognition Named Entity Recognition +1

Paper
Code

MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation

1 code implementation • NAACL 2022 • Simiao Zuo, Qingru Zhang, Chen Liang, Pengcheng He, Tuo Zhao, Weizhu Chen

We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.

Knowledge Distillation Natural Language Understanding +1

Paper
Code

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

1 code implementation • 3 Nov 2023 • Qingru Zhang, Chandan Singh, Liyuan Liu, Xiaodong Liu, Bin Yu, Jianfeng Gao, Tuo Zhao

In human-written articles, we often leverage the subtleties of text style, such as bold and italics, to guide the attention of readers.

Paper
Code

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

1 code implementation • 27 Jun 2020 • Jason Ge, Xingguo Li, Haoming Jiang, Han Liu, Tong Zhang, Mengdi Wang, Tuo Zhao

We describe a new library named picasso, which implements a unified framework of pathwise coordinate optimization for a variety of sparse learning problems (e. g., sparse linear regression, sparse logistic regression, sparse Poisson regression and scaled sparse linear regression) combined with efficient active set selection strategies.

regression Sparse Learning

Paper
Code

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM

1 code implementation • 8 Mar 2024 • Hao Kang, Qingru Zhang, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tushar Krishna, Tuo Zhao

Key-value (KV) caching has become the de-facto to accelerate generation speed for large language models (LLMs) inference.

Quantization

Paper
Code

Taming Sparsely Activated Transformer with Stochastic Experts

1 code implementation • ICLR 2022 • Simiao Zuo, Xiaodong Liu, Jian Jiao, Young Jin Kim, Hany Hassan, Ruofei Zhang, Tuo Zhao, Jianfeng Gao

While most on-going research focuses on improving SAMs models by exploring methods of routing inputs to experts, our analysis reveals that such research might not lead to the solution we expect, i. e., the commonly-used routing methods based on gating mechanisms do not work better than randomly routing inputs to experts.

Machine Translation Translation

Paper
Code

Efficient Long Sequence Modeling via State Space Augmented Transformer

1 code implementation • 15 Dec 2022 • Simiao Zuo, Xiaodong Liu, Jian Jiao, Denis Charles, Eren Manavoglu, Tuo Zhao, Jianfeng Gao

Specifically, we augment a SSM into the bottom layer of SPADE, and we employ efficient local attention methods for the other layers.

Computational Efficiency Language Modelling +2

Paper
Code

PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance

1 code implementation • 25 Jun 2022 • Qingru Zhang, Simiao Zuo, Chen Liang, Alexander Bukharin, Pengcheng He, Weizhu Chen, Tuo Zhao

Large Transformer-based models have exhibited superior performance in various natural language processing and computer vision tasks.

Image Classification Natural Language Understanding +1

Paper
Code

Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data

1 code implementation • EMNLP 2020 • Lingkai Kong, Haoming Jiang, Yuchen Zhuang, Jie Lyu, Tuo Zhao, Chao Zhang

Fine-tuned pre-trained language models can suffer from severe miscalibration for both in-distribution and out-of-distribution (OOD) data due to over-parameterization.

Language Modelling Out of Distribution (OOD) Detection +2

Paper
Code

No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models

1 code implementation • ICLR 2022 • Chen Liang, Haoming Jiang, Simiao Zuo, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Tuo Zhao

Analysis shows that the proposed schedule indeed reduces the redundancy and improves generalization performance.

Image Classification Machine Translation +2

Paper
Code

Less is More: Task-aware Layer-wise Distillation for Language Model Compression

1 code implementation • 4 Oct 2022 • Chen Liang, Simiao Zuo, Qingru Zhang, Pengcheng He, Weizhu Chen, Tuo Zhao

As such, TED reduces the knowledge gap between the two models and helps the student to fit better on the target task.

Language Modelling Model Compression

Paper
Code

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization

1 code implementation • ACL 2021 • Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu, Pengcheng He, Tuo Zhao, Weizhu Chen

The Lottery Ticket Hypothesis suggests that an over-parametrized network consists of ``lottery tickets'', and training a certain collection of them (i. e., a subnetwork) can match the performance of the full model.

Model Compression Multi-Task Learning

Paper
Code

Adversarial Regularization as Stackelberg Game: An Unrolled Optimization Approach

1 code implementation • EMNLP 2021 • Simiao Zuo, Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Jianfeng Gao, Weizhu Chen, Tuo Zhao

Adversarial regularization has been shown to improve the generalization performance of deep learning models in various natural language processing tasks.

Machine Translation Natural Language Understanding +1

Paper
Code

CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing

1 code implementation • ACL 2022 • Chen Liang, Pengcheng He, Yelong Shen, Weizhu Chen, Tuo Zhao

To retain ensemble benefits while maintaining a low memory cost, we propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO.

Ensemble Learning

Paper
Code

ARCH: Efficient Adversarial Regularized Training with Caching

1 code implementation • Findings (EMNLP) 2021 • Simiao Zuo, Chen Liang, Haoming Jiang, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Tuo Zhao

Adversarial regularization can improve model generalization in many natural language processing tasks.

Machine Translation Natural Language Understanding +1

Paper
Code

Meta Learning with Relational Information for Short Sequences

1 code implementation • NeurIPS 2019 • Yujia Xie, Haoming Jiang, Feng Liu, Tuo Zhao, Hongyuan Zha

This paper proposes a new meta-learning method -- named HARMLESS (HAwkes Relational Meta LEarning method for Short Sequences) for learning heterogeneous point process models from short event sequence data along with a relational network.

Meta-Learning

Paper
Code

Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL

1 code implementation • NeurIPS 2021 • Minshuo Chen, Yan Li, Ethan Wang, Zhuoran Yang, Zhaoran Wang, Tuo Zhao

Theoretically, under a weak coverage assumption that the experience dataset contains enough information about the optimal policy, we prove that for an episodic mean-field MDP with a horizon $H$ and $N$ training trajectories, SAFARI attains a sub-optimality gap of $\mathcal{O}(H^2d_{\rm eff} /\sqrt{N})$, where $d_{\rm eff}$ is the effective dimension of the function class for parameterizing the value function, but independent on the number of agents.

Multi-agent Reinforcement Learning

Paper
Code

Deep Reinforcement Learning with Hierarchical Reward Modeling

1 code implementation • 6 Sep 2023 • Alexander Bukharin, Yixiao Li, Pengcheng He, Weizhu Chen, Tuo Zhao

Researchers typically utilize feedback signals from the environment to handcraft a reward function, but this process is not always effective due to the varying scale and intricate dependencies of the feedback signals.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Machine Learning Force Fields with Data Cost Aware Training

1 code implementation • 5 Jun 2023 • Alexander Bukharin, Tianyi Liu, Shengjie Wang, Simiao Zuo, Weihao Gao, Wen Yan, Tuo Zhao

To address this issue, we propose a multi-stage computational framework -- ASTEROID, which lowers the data cost of MLFFs by leveraging a combination of cheap inaccurate data and expensive accurate data.

Paper
Code

To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO

1 code implementation • 6 Apr 2024 • Zi-Hao Qiu, Siqi Guo, Mao Xu, Tuo Zhao, Lijun Zhang, Tianbao Yang

In this paper, we present a principled framework for learning a small yet generalizable temperature prediction network (TempNet) to improve LFMs.

Paper
Code

SMURF-THP: Score Matching-based UnceRtainty quantiFication for Transformer Hawkes Process

1 code implementation • 25 Oct 2023 • Zichong Li, Yanbo Xu, Simiao Zuo, Haoming Jiang, Chao Zhang, Tuo Zhao, Hongyuan Zha

We conduct extensive experiments in both event type prediction and uncertainty quantification of arrival time.

Type prediction Uncertainty Quantification

Paper
Code

On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond

no code implementations • 13 Jun 2018 • Xingguo Li, Junwei Lu, Zhaoran Wang, Jarvis Haupt, Tuo Zhao

We establish a margin based data dependent generalization error bound for a general family of deep neural networks in terms of the depth and width, as well as the Jacobian of the networks.

Generalization Bounds

Paper
Add Code

On Landscape of Lagrangian Functions and Stochastic Search for Constrained Nonconvex Optimization

no code implementations • 13 Jun 2018 • Zhehui Chen, Xingguo Li, Lin F. Yang, Jarvis Haupt, Tuo Zhao

However, due to the lack of convexity, their landscape is not well understood and how to find the stable equilibria of the Lagrangian function is still unknown.

Paper
Add Code

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

no code implementations • NeurIPS 2018 • Tianyi Liu, Shiyang Li, Jianping Shi, Enlu Zhou, Tuo Zhao

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD) is one of the most popular algorithms in distributed machine learning.

Stochastic Optimization

Paper
Add Code

Dropping Convexity for More Efficient and Scalable Online Multiview Learning

no code implementations • 27 Feb 2017 • Zhehui Chen, Lin F. Yang, Chris J. Li, Tuo Zhao

Multiview representation learning is very popular for latent factor analysis.

Information Retrieval Multiview Learning +2

Paper
Add Code

Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization

no code implementations • NeurIPS 2018 • Minshuo Chen, Lin Yang, Mengdi Wang, Tuo Zhao

Specifically, our goal is to estimate the principle component of time series data with respect to the covariance matrix of the stationary distribution.

Dimensionality Reduction Stochastic Optimization +2

Paper
Add Code

Detecting Nonlinear Causality in Multivariate Time Series with Sparse Additive Models

no code implementations • 11 Mar 2018 • Yingxiang Yang, Adams Wei Yu, Zhaoran Wang, Tuo Zhao

We propose a nonparametric method for detecting nonlinear causal relationship within a set of multidimensional discrete time series, by using sparse additive models (SpAMs).

Additive models Model Selection +2

Paper
Add Code

On Quadratic Convergence of DC Proximal Newton Algorithm for Nonconvex Sparse Learning in High Dimensions

no code implementations • 19 Jun 2017 • Xingguo Li, Lin F. Yang, Jason Ge, Jarvis Haupt, Tong Zhang, Tuo Zhao

We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions.

Sparse Learning

Paper
Add Code

A Diffusion Approximation Theory of Momentum SGD in Nonconvex Optimization

no code implementations • 14 Feb 2018 • Tianyi Liu, Zhehui Chen, Enlu Zhou, Tuo Zhao

Our theoretical discovery partially corroborates the empirical success of MSGD in training deep neural networks.

Bayesian Inference Dimensionality Reduction +1

Paper
Add Code

On Fast Convergence of Proximal Algorithms for SQRT-Lasso Optimization: Don't Worry About Its Nonsmooth Loss Function

no code implementations • 25 May 2016 • Xingguo Li, Haoming Jiang, Jarvis Haupt, Raman Arora, Han Liu, Mingyi Hong, Tuo Zhao

Many machine learning techniques sacrifice convenient computational structures to gain estimation robustness and modeling flexibility.

regression

Paper
Add Code

Deep Hyperspherical Learning

no code implementations • NeurIPS 2017 • Weiyang Liu, Yan-Ming Zhang, Xingguo Li, Zhiding Yu, Bo Dai, Tuo Zhao, Le Song

In light of such challenges, we propose hyperspherical convolution (SphereConv), a novel learning framework that gives angular representations on hyperspheres.

Representation Learning

Paper
Add Code

Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization

no code implementations • 29 Dec 2016 • Xingguo Li, Junwei Lu, Raman Arora, Jarvis Haupt, Han Liu, Zhaoran Wang, Tuo Zhao

We propose a general theory for studying the \xl{landscape} of nonconvex \xl{optimization} with underlying symmetric structures \tz{for a class of machine learning problems (e. g., low-rank matrix factorization, phase retrieval, and deep linear neural networks)}.

Retrieval

Paper
Add Code

Nonconvex Sparse Learning via Stochastic Optimization with Progressive Variance Reduction

no code implementations • 9 May 2016 • Xingguo Li, Raman Arora, Han Liu, Jarvis Haupt, Tuo Zhao

We propose a stochastic variance reduced optimization algorithm for solving sparse learning problems with cardinality constraints.

Sparse Learning Stochastic Optimization

Paper
Add Code

Misspecified Nonconvex Statistical Optimization for Phase Retrieval

no code implementations • 18 Dec 2017 • Zhuoran Yang, Lin F. Yang, Ethan X. Fang, Tuo Zhao, Zhaoran Wang, Matey Neykov

Existing nonconvex statistical optimization theory and methods crucially rely on the correct specification of the underlying "true" statistical models.

Retrieval

Paper
Add Code

Online Factorization and Partition of Complex Networks From Random Walks

no code implementations • 22 May 2017 • Lin F. Yang, Vladimir Braverman, Tuo Zhao, Mengdi Wang

We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm.

Clustering

Paper
Add Code

Homotopy Parametric Simplex Method for Sparse Learning

no code implementations • 4 Apr 2017 • Haotian Pang, Robert Vanderbei, Han Liu, Tuo Zhao

High dimensional sparse learning has imposed a great computational challenge to large scale data analysis.

regression Sparse Learning

Paper
Add Code

On Faster Convergence of Cyclic Block Coordinate Descent-type Methods for Strongly Convex Minimization

no code implementations • 10 Jul 2016 • Xingguo Li, Tuo Zhao, Raman Arora, Han Liu, Mingyi Hong

In particular, we first show that for a family of quadratic minimization problems, the iteration complexity $\mathcal{O}(\log^2(p)\cdot\log(1/\epsilon))$ of the CBCD-type methods matches that of the GD methods in term of dependency on $p$, up to a $\log^2 p$ factor.

regression

Paper
Add Code

The Physical Systems Behind Optimization Algorithms

no code implementations • NeurIPS 2018 • Lin F. Yang, R. Arora, V. Braverman, Tuo Zhao

We use differential equations based approaches to provide some {\it \textbf{physics}} insights into analyzing the dynamics of popular optimization algorithms in machine learning.

BIG-bench Machine Learning

Paper
Add Code

Pathwise Coordinate Optimization for Sparse Learning: Algorithm and Theory

no code implementations • 23 Dec 2014 • Tuo Zhao, Han Liu, Tong Zhang

This is the first result on the computational and statistical guarantees of the pathwise coordinate optimization framework in high dimensions.

Sparse Learning

Paper
Add Code

NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization

no code implementations • NeurIPS 2016 • Davood Hajinezhad, Mingyi Hong, Tuo Zhao, Zhaoran Wang

We study a stochastic and distributed algorithm for nonconvex problems whose objective consists of a sum of $N$ nonconvex $L_i/N$-smooth functions, plus a nonsmooth regularizer.

Stochastic Optimization

Paper
Add Code

Calibrated Multivariate Regression with Application to Neural Semantic Basis Discovery

no code implementations • 10 May 2013 • Han Liu, Lie Wang, Tuo Zhao

We propose a calibrated multivariate regression method named CMR for fitting high dimensional multivariate regression models.

Activity Prediction regression

Paper
Add Code

Provable Gaussian Embedding with One Observation

no code implementations • NeurIPS 2018 • Ming Yu, Zhuoran Yang, Tuo Zhao, Mladen Kolar, Zhaoran Wang

In this paper, we study the Gaussian embedding model and develop the first theoretical results for exponential family embedding models.

BIG-bench Machine Learning

Paper
Add Code

Learning to Defend by Learning to Attack

no code implementations • 3 Nov 2018 • Haoming Jiang, Zhehui Chen, Yuyang Shi, Bo Dai, Tuo Zhao

Adversarial training provides a principled approach for training robust neural networks.

Adversarial Attack Adversarial Defense +3

Paper
Add Code

Parametric Simplex Method for Sparse Learning

no code implementations • NeurIPS 2017 • Haotian Pang, Han Liu, Robert J. Vanderbei, Tuo Zhao

High dimensional sparse learning has imposed a great computational challenge to large scale data analysis.

Sparse Learning

Paper
Add Code

On Quadratic Convergence of DC Proximal Newton Algorithm in Nonconvex Sparse Learning

no code implementations • NeurIPS 2017 • Xingguo Li, Lin Yang, Jason Ge, Jarvis Haupt, Tong Zhang, Tuo Zhao

We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions.

Sparse Learning

Paper
Add Code

A Nonconvex Optimization Framework for Low Rank Matrix Estimation

no code implementations • NeurIPS 2015 • Tuo Zhao, Zhaoran Wang, Han Liu

We study the estimation of low rank matrices via nonconvex optimization.

Paper
Add Code

Multivariate Regression with Calibration

no code implementations • NeurIPS 2014 • Han Liu, Lie Wang, Tuo Zhao

We propose a new method named calibrated multivariate regression (CMR) for fitting high dimensional multivariate regression models.

Activity Prediction regression

Paper
Add Code

Accelerated Mini-batch Randomized Block Coordinate Descent Method

no code implementations • NeurIPS 2014 • Tuo Zhao, Mo Yu, Yiming Wang, Raman Arora, Han Liu

When the regularization function is block separable, we can solve the minimization problems in a randomized block coordinate descent (RBCD) manner.

Sparse Learning Stochastic Optimization

Paper
Add Code

Sparse Inverse Covariance Estimation with Calibration

no code implementations • NeurIPS 2013 • Tuo Zhao, Han Liu

We propose a semiparametric procedure for estimating high dimensional sparse inverse covariance matrix.

Model Selection

Paper
Add Code

Online Partial Least Square Optimization: Dropping Convexity for Better Efficiency and Scalability

no code implementations • ICML 2017 • Zhehui Chen, Lin F. Yang, Chris Junchi Li, Tuo Zhao

Multiview representation learning is popular for latent factor analysis.

Representation Learning

Paper
Add Code

On Computation and Generalization of Generative Adversarial Networks under Spectrum Control

no code implementations • ICLR 2019 • Haoming Jiang, Zhehui Chen, Minshuo Chen, Feng Liu, Dingding Wang, Tuo Zhao

Generative Adversarial Networks (GANs), though powerful, is hard to train.

Paper
Add Code

On Tighter Generalization Bounds for Deep Neural Networks: CNNs, ResNets, and Beyond

no code implementations • ICLR 2019 • Xingguo Li, Junwei Lu, Zhaoran Wang, Jarvis Haupt, Tuo Zhao

We propose a generalization error bound for a general family of deep neural networks based on the depth and width of the networks, as well as the spectral norm of weight matrices.

Generalization Bounds

Paper
Add Code

On Computation and Generalization of GANs with Spectrum Control

no code implementations • 28 Dec 2018 • Haoming Jiang, Zhehui Chen, Minshuo Chen, Feng Liu, Dingding Wang, Tuo Zhao

Specifically, we propose a new reparameterization approach for the weight matrices of the discriminator in GANs, which allows us to directly manipulate the spectra of the weight matrices through various regularizers and constraints, without intensively computing singular value decompositions.

Paper
Add Code

On Scalable and Efficient Computation of Large Scale Optimal Transport

no code implementations • ICLR Workshop DeepGenStruct 2019 • Yujia Xie, Minshuo Chen, Haoming Jiang, Tuo Zhao, Hongyuan Zha

Optimal Transport (OT) naturally arises in many machine learning applications, yet the heavy computational burden limits its wide-spread uses.

Domain Adaptation

Paper
Add Code

Inductive Bias of Gradient Descent based Adversarial Training on Separable Data

no code implementations • 7 Jun 2019 • Yan Li, Ethan X. Fang, Huan Xu, Tuo Zhao

Specifically, we show that when the adversarial perturbation during training has bounded $\ell_2$-norm, the classifier learned by gradient descent based adversarial training converges in direction to the maximum $\ell_2$-norm margin classifier at the rate of $\tilde{\mathcal{O}}(1/\sqrt{T})$, significantly faster than the rate $\mathcal{O}(1/\log T)$ of training with clean data.

Binary Classification Inductive Bias

Paper
Add Code

Nonparametric Regression on Low-Dimensional Manifolds using Deep ReLU Networks : Function Approximation and Statistical Recovery

no code implementations • NeurIPS 2019 • Minshuo Chen, Haoming Jiang, Wenjing Liao, Tuo Zhao

It therefore demonstrates the adaptivity of deep ReLU networks to low-dimensional geometric structures of data, and partially explains the power of deep ReLU networks in tackling high-dimensional data with low-dimensional geometric structures.

regression

Paper
Add Code

Towards Understanding the Importance of Noise in Training Neural Networks

no code implementations • 7 Sep 2019 • Mo Zhou, Tianyi Liu, Yan Li, Dachao Lin, Enlu Zhou, Tuo Zhao

Numerous empirical evidence has corroborated that the noise plays a crucial rule in effective and efficient training of neural networks.

Paper
Add Code

Towards Understanding the Importance of Shortcut Connections in Residual Networks

no code implementations • NeurIPS 2019 • Tianyi Liu, Minshuo Chen, Mo Zhou, Simon S. Du, Enlu Zhou, Tuo Zhao

We show, however, that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialized arbitrarily in a ball.

Paper
Add Code

On Generalization Bounds of a Family of Recurrent Neural Networks

no code implementations • ICLR 2019 • Minshuo Chen, Xingguo Li, Tuo Zhao

We remark: (1) Our generalization bound for vanilla RNNs is significantly tighter than the best of existing results; (2) We are not aware of any other generalization bounds for MGU, LSTM, and Conv RNNs in the exiting literature; (3) We demonstrate the advantages of these variants in generalization.

Generalization Bounds PAC learning

Paper
Add Code

Multi-Domain Neural Machine Translation with Word-Level Adaptive Layer-wise Domain Mixing

1 code implementation • ACL 2020 • Haoming Jiang, Chen Liang, Chong Wang, Tuo Zhao

To overcome this limitation, we propose a novel multi-domain NMT model using individual modules for each domain, on which we apply word-level, adaptive and layer-wise domain mixing.

Machine Translation NMT +3

Paper
Code

Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds

no code implementations • NeurIPS 2019 • Minshuo Chen, Haoming Jiang, Wenjing Liao, Tuo Zhao

The network size scales exponentially in the approximation error, with an exponent depending on the intrinsic dimension of the data and the smoothness of the function.

Paper
Add Code

On Computation and Generalization of Generative Adversarial Imitation Learning

no code implementations • ICLR 2020 • Minshuo Chen, Yizhou Wang, Tianyi Liu, Zhuoran Yang, Xingguo Li, Zhaoran Wang, Tuo Zhao

Generative Adversarial Imitation Learning (GAIL) is a powerful and practical approach for learning sequential decision-making policies.

Imitation Learning Reinforcement Learning (RL)

Paper
Add Code

Distribution Approximation and Statistical Estimation Guarantees of Generative Adversarial Networks

no code implementations • 10 Feb 2020 • Minshuo Chen, Wenjing Liao, Hongyuan Zha, Tuo Zhao

Generative Adversarial Networks (GANs) have achieved a great success in unsupervised learning.

Paper
Add Code

Differentiable Top-k Operator with Optimal Transport

no code implementations • 16 Feb 2020 • Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister

The top-k operation, i. e., finding the k largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining.

Information Retrieval Retrieval

Paper
Add Code

Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? -- A Neural Tangent Kernel Perspective

no code implementations • 14 Feb 2020 • Kaixuan Huang, Yuqing Wang, Molei Tao, Tuo Zhao

We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity.

Paper
Add Code

Deep Reinforcement Learning with Robust and Smooth Policy

no code implementations • 21 Mar 2020 • Qianli Shen, Yan Li, Haoming Jiang, Zhaoran Wang, Tuo Zhao

Deep reinforcement learning (RL) has achieved great empirical successes in various domains.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Towards Understanding Hierarchical Learning: Benefits of Neural Representations

no code implementations • NeurIPS 2020 • Minshuo Chen, Yu Bai, Jason D. Lee, Tuo Zhao, Huan Wang, Caiming Xiong, Richard Socher

When the trainable network is the quadratic Taylor model of a wide two-layer network, we show that neural representation can achieve improved sample complexities compared with the raw input: For learning a low-rank degree-$p$ polynomial ($p \geq 4$) in $d$ dimension, neural representation requires only $\tilde{O}(d^{\lceil p/2 \rceil})$ samples, while the best-known sample complexity upper bound for the raw input is $\tilde{O}(d^{p-1})$.

Paper
Add Code

The huge Package for High-dimensional Undirected Graph Estimation in R

no code implementations • 26 Jun 2020 • Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty, Larry Wasserman

We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data.

Model Selection Vocal Bursts Intensity Prediction

Paper
Add Code

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R

no code implementations • 27 Jun 2020 • Xingguo Li, Tuo Zhao, Xiaoming Yuan, Han Liu

This paper describes an R package named flare, which implements a family of new high dimensional regression methods (LAD Lasso, SQRT Lasso, $\ell_q$ Lasso, and Dantzig selector) and their extensions to sparse precision matrix estimation (TIGER and CLIME).

regression

Paper
Add Code

Residual Network Based Direct Synthesis of EM Structures: A Study on One-to-One Transformers

no code implementations • 25 Aug 2020 • David Munzer, Siawpeng Er, Minshuo Chen, Yan Li, Naga S. Mannem, Tuo Zhao, Hua Wang

We propose using machine learning models for the direct synthesis of on-chip electromagnetic (EM) passive structures to enable rapid or even automated designs and optimizations of RF/mm-Wave circuits.

BIG-bench Machine Learning

Paper
Add Code

Implicit Bias of Gradient Descent based Adversarial Training on Separable Data

no code implementations • ICLR 2020 • Yan Li, Ethan X. Fang, Huan Xu, Tuo Zhao

Specifically, we show that for any fixed iteration $T$, when the adversarial perturbation during training has proper bounded L2 norm, the classifier learned by gradient descent based adversarial training converges in direction to the maximum L2 norm margin classifier at the rate of $O(1/\sqrt{T})$, significantly faster than the rate $O(1/\log T}$ of training with clean data.

Binary Classification

Paper
Add Code

Deep Reinforcement Learning with Smooth Policy

no code implementations • ICML 2020 • Qianli Shen, Yan Li, Haoming Jiang, Zhaoran Wang, Tuo Zhao

In contrast to policy parameterized by linear/reproducing kernel functions, where simple regularization techniques suffice to control smoothness, for neural network based reinforcement learning algorithms, there is no readily available solution to learn a smooth policy.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

How Important is the Train-Validation Split in Meta-Learning?

no code implementations • 12 Oct 2020 • Yu Bai, Minshuo Chen, Pan Zhou, Tuo Zhao, Jason D. Lee, Sham Kakade, Huan Wang, Caiming Xiong

A common practice in meta-learning is to perform a train-validation split (\emph{train-val method}) where the prior adapts to the task on one split of the data, and the resulting predictor is evaluated on another split.

Meta-Learning

Paper
Add Code

Doubly Robust Off-Policy Learning on Low-Dimensional Manifolds by Deep Neural Networks

no code implementations • 3 Nov 2020 • Minshuo Chen, Hao liu, Wenjing Liao, Tuo Zhao

Our theory shows that deep neural networks are adaptive to the low-dimensional geometric structures of the covariates, and partially explains the success of deep learning for causal inference.

Causal Inference

Paper
Add Code

Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? --- A Neural Tangent Kernel Perspective

no code implementations • NeurIPS 2020 • Kaixuan Huang, Yuqing Wang, Molei Tao, Tuo Zhao

Paper
Add Code

Differentiable Top-k with Optimal Transport

no code implementations • NeurIPS 2020 • Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister

Finding the k largest or smallest elements from a collection of scores, i. e., top-k operation, is an important model component widely used in information retrieval, machine learning, and data mining.

Information Retrieval Retrieval

Paper
Add Code

A Hypergradient Approach to Robust Regression without Correspondence

no code implementations • ICLR 2021 • Yujia Xie, Yixiu Mao, Simiao Zuo, Hongteng Xu, Xiaojing Ye, Tuo Zhao, Hongyuan Zha

Due to the combinatorial nature of the problem, most existing methods are only applicable when the sample size is small, and limited to linear regression models.

Multi-Object Tracking regression

Paper
Add Code

Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

no code implementations • 24 Feb 2021 • Tianyi Liu, Yan Li, Song Wei, Enlu Zhou, Tuo Zhao

Numerous empirical evidences have corroborated the importance of noise in nonconvex optimization problems.

Paper
Add Code

Reinforcement Learning for Adaptive Mesh Refinement

no code implementations • 1 Mar 2021 • Jiachen Yang, Tarik Dzanic, Brenden Petersen, Jun Kudo, Ketan Mittal, Vladimir Tomov, Jean-Sylvain Camier, Tuo Zhao, Hongyuan Zha, Tzanio Kolev, Robert Anderson, Daniel Faissol

Large-scale finite element simulations of complex physical systems governed by partial differential equations (PDE) crucially depend on adaptive mesh refinement (AMR) to allocate computational budget to regions where higher resolution is required.

Inductive Bias reinforcement-learning +1

Paper
Add Code

Token-wise Curriculum Learning for Neural Machine Translation

no code implementations • Findings (EMNLP) 2021 • Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Tuo Zhao

Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage.

Machine Translation NMT +2

Paper
Add Code

COUnty aggRegation mixup AuGmEntation (COURAGE) COVID-19 Prediction

no code implementations • 3 May 2021 • Siawpeng Er, Shihao Yang, Tuo Zhao

The global spread of COVID-19, the disease caused by the novel coronavirus SARS-CoV-2, has cast a significant threat to mankind.

Computational Efficiency Time Series +1

Paper
Add Code

Permutation Invariant Policy Optimization for Mean-Field Multi-Agent Reinforcement Learning: A Principled Approach

no code implementations • 18 May 2021 • Yan Li, Lingxiao Wang, Jiachen Yang, Ethan Wang, Zhaoran Wang, Tuo Zhao, Hongyuan Zha

To exploit the permutation invariance therein, we propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.

Inductive Bias Multi-agent Reinforcement Learning

Paper
Add Code

Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data

no code implementations • 15 Aug 2021 • Yan Li, Caleb Ju, Ethan X. Fang, Tuo Zhao

For any BPPA instantiated with a fixed Bregman divergence, we provide a lower bound of the margin obtained by BPPA with respect to an arbitrarily chosen norm.

Paper
Add Code

QUEACO: Borrowing Treasures from Weakly-labeled Behavior Data for Query Attribute Value Extraction

no code implementations • 19 Aug 2021 • Danqing Zhang, Zheng Li, Tianyu Cao, Chen Luo, Tony Wu, Hanqing Lu, Yiwei Song, Bing Yin, Tuo Zhao, Qiang Yang

We study the problem of query attribute value extraction, which aims to identify named entities from user queries as diverse surface form attribute values and afterward transform them into formally canonical forms.

Attribute Attribute Value Extraction +3

Paper
Add Code

Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks

no code implementations • 7 Sep 2021 • Hao liu, Minshuo Chen, Tuo Zhao, Wenjing Liao

Most of existing statistical theories on deep neural networks have sample complexities cursed by the data dimension and therefore cannot well explain the empirical success of deep learning on high-dimensional data.

Binary Classification

Paper
Add Code

Self-Training with Differentiable Teacher

no code implementations • Findings (NAACL) 2022 • Simiao Zuo, Yue Yu, Chen Liang, Haoming Jiang, Siawpeng Er, Chao Zhang, Tuo Zhao, Hongyuan Zha

In self-training, the student contributes to the prediction performance, and the teacher controls the training process by generating pseudo-labels.

named-entity-recognition Named Entity Recognition +3

Paper
Add Code

Adversarially Regularized Policy Learning Guided by Trajectory Optimization

no code implementations • 16 Sep 2021 • Zhigen Zhao, Simiao Zuo, Tuo Zhao, Ye Zhao

Recent advancement in combining trajectory optimization with function approximation (especially neural networks) shows promise in learning complex control policies for diverse tasks in robot systems.

Robot Manipulation

Paper
Add Code

Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect

no code implementations • ICLR 2022 • Yuqing Wang, Minshuo Chen, Tuo Zhao, Molei Tao

Moreover, we rigorously establish an implicit bias of GD induced by such a large learning rate, termed 'balancing', meaning that magnitudes of $X$ and $Y$ at the limit of GD iterations will be close even if their initialization is significantly unbalanced.

Paper
Add Code

A Principled Permutation Invariant Approach to Mean-Field Multi-Agent Reinforcement Learning

no code implementations • 29 Sep 2021 • Yan Li, Lingxiao Wang, Jiachen Yang, Ethan Wang, Zhaoran Wang, Tuo Zhao, Hongyuan Zha

Inductive Bias Multi-agent Reinforcement Learning +2

Paper
Add Code

Frequency-aware SGD for Efficient Embedding Learning with Provable Benefits

no code implementations • ICLR 2022 • Yan Li, Dhruv Choudhary, Xiaohan Wei, Baichuan Yuan, Bhargav Bhushanam, Tuo Zhao, Guanghui Lan

We show that incorporating frequency information of tokens in the embedding learning problems leads to provably efficient algorithms, and demonstrate that common adaptive algorithms implicitly exploit the frequency information to a large extent.

Language Modelling Recommendation Systems

Paper
Add Code

Learning to Defense by Learning to Attack

no code implementations • ICLR Workshop DeepGenStruct 2019 • Zhehui Chen, Haoming Jiang, Yuyang Shi, Bo Dai, Tuo Zhao

From the perspective of generative learning, our proposed method can be viewed as learning a deep generative model for generating adversarial samples, which is adaptive to the robust classification.

Adversarial Attack Robust classification

Paper
Add Code

Differentiable Top-$k$ with Optimal Transport

no code implementations • NeurIPS Workshop LMCA 2020 • Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister

The top-$k$ operation, i. e., finding the $k$ largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining.

Information Retrieval Retrieval

Paper
Add Code

Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces

no code implementations • 1 Jan 2022 • Hao liu, Haizhao Yang, Minshuo Chen, Tuo Zhao, Wenjing Liao

Learning operators between infinitely dimensional spaces is an important learning task arising in wide applications in machine learning, imaging science, mathematical modeling and simulations, etc.

Paper
Add Code

Deep Learning Assisted End-to-End Synthesis of mm-Wave Passive Networks with 3D EM Structures: A Study on A Transformer-Based Matching Network

no code implementations • 6 Jan 2022 • Siawpeng Er, Edward Liu, Minshuo Chen, Yan Li, Yuqi Liu, Tuo Zhao, Hua Wang

This paper presents a deep learning assisted synthesis approach for direct end-to-end generation of RF/mm-wave passive matching network with 3D EM structures.

Paper
Add Code

Block Policy Mirror Descent

no code implementations • 15 Jan 2022 • Guanghui Lan, Yan Li, Tuo Zhao

Despite the nonconvex nature of the problem and a partial update rule, we provide a unified analysis for several sampling schemes, and show that BPMD achieves fast linear convergence to the global optimality.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity

no code implementations • 24 Jan 2022 • Yan Li, Guanghui Lan, Tuo Zhao

We first establish the global linear convergence of HPMD instantiated with Kullback-Leibler divergence, for both the optimality gap, and a weighted distance to the set of optimal policies.

Policy Gradient Methods

Paper
Add Code

Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably

no code implementations • 7 Feb 2022 • Tianyi Liu, Yan Li, Enlu Zhou, Tuo Zhao

We investigate the role of noise in optimization algorithms for learning over-parameterized models.

Paper
Add Code

CERES: Pretraining of Graph-Conditioned Transformer for Semi-Structured Session Data

no code implementations • NAACL 2022 • Rui Feng, Chen Luo, Qingyu Yin, Bing Yin, Tuo Zhao, Chao Zhang

User sessions empower many search and recommendation tasks on a daily basis.

Entity Linking Self-Supervised Learning +1

Paper
Add Code

A Manifold Two-Sample Test Study: Integral Probability Metric with Neural Networks

no code implementations • 4 May 2022 • Jie Wang, Minshuo Chen, Tuo Zhao, Wenjing Liao, Yao Xie

Based on the approximation theory of neural networks, we show that the neural network IPM test has the type-II risk in the order of $n^{-(s+\beta)/d}$, which is in the same order of the type-II risk as the H\"older IPM test.

Paper
Add Code

Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks

no code implementations • 6 Jun 2022 • Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao

We consider the off-policy evaluation problem of reinforcement learning using deep convolutional neural networks.

Off-policy evaluation

Paper
Add Code

Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint

no code implementations • 9 Jun 2022 • Hao liu, Minshuo Chen, Siawpeng Er, Wenjing Liao, Tong Zhang, Tuo Zhao

Overparameterized neural networks enjoy great representation power on complex data, and more importantly yield sufficiently smooth output, which is crucial to their generalization and robustness.

Image Classification

Paper
Add Code

DiP-GNN: Discriminative Pre-Training of Graph Neural Networks

no code implementations • 15 Sep 2022 • Simiao Zuo, Haoming Jiang, Qingyu Yin, Xianfeng Tang, Bing Yin, Tuo Zhao

Specifically, we train a generator to recover identities of the masked edges, and simultaneously, we train a discriminator to distinguish the generated edges from the original graph's edges.

Node Classification

Paper
Add Code

Differentially Private Estimation of Hawkes Process

no code implementations • 15 Sep 2022 • Simiao Zuo, Tianyi Liu, Tuo Zhao, Hongyuan Zha

Point process models are of great importance in real world applications.

Paper
Add Code

Context-Aware Query Rewriting for Improving Users' Search Experience on E-commerce Websites

no code implementations • 15 Sep 2022 • Simiao Zuo, Qingyu Yin, Haoming Jiang, Shaohui Xi, Bing Yin, Chao Zhang, Tuo Zhao

The model subsequently calculates session representations by combining the contextual information with the instant search query using an aggregation network.

Graph Attention

Paper
Add Code

First-order Policy Optimization for Robust Markov Decision Process

no code implementations • 21 Sep 2022 • Yan Li, Guanghui Lan, Tuo Zhao

We consider the problem of solving robust Markov decision process (MDP), which involves a set of discounted, finite state, finite action space MDPs with uncertain transition kernels.

Paper
Add Code

High Dimensional Binary Classification under Label Shift: Phase Transition and Regularization

no code implementations • 1 Dec 2022 • Jiahui Cheng, Minshuo Chen, Hao liu, Tuo Zhao, Wenjing Liao

Label Shift has been widely believed to be harmful to the generalization performance of machine learning models.

Binary Classification Vocal Bursts Intensity Prediction

Paper
Add Code

Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data

no code implementations • 14 Feb 2023 • Minshuo Chen, Kaixuan Huang, Tuo Zhao, Mengdi Wang

Furthermore, the generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.

Paper
Add Code

HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers

no code implementations • 19 Feb 2023 • Chen Liang, Haoming Jiang, Zheng Li, Xianfeng Tang, Bin Yin, Tuo Zhao

Since the teacher model has a significantly larger capacity and stronger representation power than the student model, it is very difficult for the student to produce predictions that match the teacher's over a massive amount of open-domain training data.

Knowledge Distillation Model Compression +1

Paper
Add Code

On Deep Generative Models for Approximation and Estimation of Distributions on Manifolds

no code implementations • 25 Feb 2023 • Biraj Dahal, Alex Havrilla, Minshuo Chen, Tuo Zhao, Wenjing Liao

Many existing experiments have demonstrated that generative networks can generate high-dimensional complex data from a low-dimensional easy-to-sample distribution.

Paper
Add Code

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

no code implementations • 20 Jun 2023 • Yixiao Li, Yifan Yu, Qingru Zhang, Chen Liang, Pengcheng He, Weizhu Chen, Tuo Zhao

Pruning enhances the diversity of low-rank approximations, and low-rank approximation prevents pruning from losing too many expressive neurons.

Model Compression Natural Language Understanding +2

Paper
Add Code

Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories

no code implementations • 26 Jun 2023 • Zixuan Zhang, Minshuo Chen, Mengdi Wang, Wenjing Liao, Tuo Zhao

Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to the intrinsic data structures.

regression

Paper
Add Code

Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks

no code implementations • 4 Jul 2023 • Kaiqi Zhang, Zixuan Zhang, Minshuo Chen, Yuma Takeda, Mengdi Wang, Tuo Zhao, Yu-Xiang Wang

Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkable prediction performance in practice, which cannot be well explained by conventional wisdom.

Paper
Add Code

Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems

no code implementations • 24 Jul 2023 • Xiang Ji, Huazheng Wang, Minshuo Chen, Tuo Zhao, Mengdi Wang

A popular approach is to utilize human feedback to learn a reward function for training.

Decision Making Multi-Armed Bandits

Paper
Add Code

Pivotal Estimation of Linear Discriminant Analysis in High Dimensions

no code implementations • 18 Sep 2023 • Ethan X. Fang, Yajun Mei, Yuyang Shi, Qunzhi Xu, Tuo Zhao

We consider the linear discriminant analysis problem in the high-dimensional settings.

Paper
Add Code

Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds

no code implementations • 25 Sep 2023 • Zhenghao Xu, Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao

As a result, by properly choosing the network size and hyperparameters, NPMD can find an $\epsilon$-optimal policy with $\widetilde{O}(\epsilon^{-\frac{d}{\alpha}-2})$ samples in expectation, where $\alpha\in(0, 1]$ indicates the smoothness of environment.

Policy Gradient Methods Reinforcement Learning (RL)

Paper
Add Code

Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer

no code implementations • 19 Oct 2023 • Qingru Zhang, Dhananjay Ram, Cole Hawkins, Sheng Zha, Tuo Zhao

These models leverage the attention mechanism to capture long- and short-range dependencies in the sequence.

8k Computational Efficiency +1

Paper
Add Code

Score Matching-based Pseudolikelihood Estimation of Neural Marked Spatio-Temporal Point Process with Uncertainty Quantification

no code implementations • 25 Oct 2023 • Zichong Li, Qunzhi Xu, Zhenghao Xu, Yajun Mei, Tuo Zhao, Hongyuan Zha

Specifically, our framework adopts a normalization-free objective by estimating the pseudolikelihood of marked STPPs through score-matching and offers uncertainty quantification for the predicted event time, location and mark by computing confidence regions over the generated samples.

Point Processes Uncertainty Quantification

Paper
Add Code

Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult

no code implementations • 26 Oct 2023 • Yuqing Wang, Zhenghao Xu, Tuo Zhao, Molei Tao

This regularity, together with gradient descent using a large learning rate that favors flatter regions, results in these nontrivial dynamical behaviors.

Paper
Add Code

Data Diversity Matters for Robust Instruction Tuning

no code implementations • 21 Nov 2023 • Alexander Bukharin, Tuo Zhao

QDIT provides a simple method to simultaneously control dataset diversity and quality, allowing us to conduct an in-depth study on the effect of diversity and quality on instruction tuning performance.

Instruction Following

Paper
Add Code

BlendFilter: Advancing Retrieval-Augmented Large Language Models via Query Generation Blending and Knowledge Filtering

no code implementations • 16 Feb 2024 • Haoyu Wang, Tuo Zhao, Jing Gao

Retrieval-augmented Large Language Models (LLMs) offer substantial benefits in enhancing performance across knowledge-intensive scenarios.

Open-Domain Question Answering Retrieval

Paper
Add Code

Stochastic Constrained Decentralized Optimization for Machine Learning with Fewer Data Oracles: a Gradient Sliding Approach

no code implementations • 3 Apr 2024 • Hoang Huy Nguyen, Yan Li, Tuo Zhao

In modern decentralized applications, ensuring communication efficiency and privacy for the users are the key challenges.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.