Search Results for author: Mengdi Wang

Found 60 papers, 8 papers with code

Optimal policy evaluation using kernel-based temporal difference methods

no code implementations24 Sep 2021 Yaqi Duan, Mengdi Wang, Martin J. Wainwright

Whereas existing worst-case theory predicts cubic scaling ($H^3$) in the effective horizon, our theory reveals that there is in fact a much wider range of scalings, depending on the kernel, the stationary distribution, and the variance of the Bellman residual error.

Boosting the Convergence of Reinforcement Learning-based Auto-pruning Using Historical Data

no code implementations16 Jul 2021 Jiandong Mu, Mengdi Wang, Feiwen Zhu, Jun Yang, Wei Lin, Wei zhang

Reinforcement learning (RL)-based auto-pruning has been further proposed to automate the DNN pruning process to avoid expensive hand-crafted work.

Neural Network Compression Transfer Learning

On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control

no code implementations15 Jun 2021 Amrit Singh Bedi, Anjaly Parayil, Junyu Zhang, Mengdi Wang, Alec Koppel

To close this gap, we step towards persistent exploration in continuous space through policy parameterizations defined by distributions of heavier tails defined by tail-index parameter alpha, which increases the likelihood of jumping in state space.

Continuous Control Decision Making

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient

1 code implementation4 Jun 2021 Shaokun Zhang, Xiawu Zheng, Chenyi Yang, Yuchao Li, Yan Wang, Fei Chao, Mengdi Wang, Shen Li, Jun Yang, Rongrong Ji

Motivated by the necessity of efficient inference across various constraints on BERT, we propose a novel approach, YOCO-BERT, to achieve compress once and deploy everywhere.

AutoML Model Compression

1$\times$N Block Pattern for Network Sparsity

1 code implementation31 May 2021 Mingbao Lin, Yuchao Li, Yuxin Zhang, Bohong Chen, Fei Chao, Mengdi Wang, Shen Li, Jun Yang, Rongrong Ji

In particular, consecutive $N$ output kernels with the same input channel index are grouped into one block, which serves as a basic pruning granularity of our pruning pattern.

MARL with General Utilities via Decentralized Shadow Reward Actor-Critic

no code implementations29 May 2021 Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel

DSAC augments the classic critic step by requiring agents to (i) estimate their local occupancy measure in order to (ii) estimate the derivative of the local utility with respect to their occupancy measure, i. e., the "shadow reward".

Multi-agent Reinforcement Learning

Learning Good State and Action Representations via Tensor Decomposition

no code implementations3 May 2021 Chengzhuo Ni, Anru Zhang, Yaqi Duan, Mengdi Wang

The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure.

Tensor Decomposition

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

no code implementations17 Feb 2021 Junyu Zhang, Chengzhuo Ni, Zheng Yu, Csaba Szepesvari, Mengdi Wang

By assuming the overparameterizaiton of policy and exploiting the hidden convexity of the problem, we further show that TSIVR-PG converges to global $\epsilon$-optimal policy with $\tilde{\mathcal{O}}(\epsilon^{-2})$ samples.

Bootstrapping Statistical Inference for Off-Policy Evaluation

no code implementations6 Feb 2021 Botao Hao, Xiang Ji, Yaqi Duan, Hao Lu, Csaba Szepesvári, Mengdi Wang

Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical property is less understood.

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

no code implementations NeurIPS 2020 Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael Jordan

Reinforcement learning (RL) algorithms combined with modern function approximators such as kernel functions and deep neural networks have achieved significant empirical successes in large-scale application problems with a massive number of states.

On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces

no code implementations9 Nov 2020 Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael I. Jordan

The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions.

Online Sparse Reinforcement Learning

no code implementations8 Nov 2020 Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

First, we provide a lower bound showing that linear regret is generally unavoidable in this case, even if there exists a policy that collects well-conditioned data.

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

no code implementations8 Nov 2020 Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension.

Feature Selection Model Selection

High-Dimensional Sparse Linear Bandits

no code implementations NeurIPS 2020 Botao Hao, Tor Lattimore, Mengdi Wang

Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising.

Generalized Leverage Score Sampling for Neural Networks

no code implementations NeurIPS 2020 Jason D. Lee, Ruoqi Shen, Zhao Song, Mengdi Wang, Zheng Yu

Leverage score sampling is a powerful technique that originates from theoretical computer science, which can be used to speed up a large number of fundamental questions, e. g. linear regression, linear programming, semi-definite programming, cutting plane method, graph sparsification, maximum matching and max-flow.

Learning Theory

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

no code implementations NeurIPS 2020 Junyu Zhang, Alec Koppel, Amrit Singh Bedi, Csaba Szepesvari, Mengdi Wang

Analogously to the Policy Gradient Theorem \cite{sutton2000policy} available for RL with cumulative rewards, we derive a new Variational Policy Gradient Theorem for RL with general utilities, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function.

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

1 code implementation27 Jun 2020 Jason Ge, Xingguo Li, Haoming Jiang, Han Liu, Tong Zhang, Mengdi Wang, Tuo Zhao

We describe a new library named picasso, which implements a unified framework of pathwise coordinate optimization for a variety of sparse learning problems (e. g., sparse linear regression, sparse logistic regression, sparse Poisson regression and scaled sparse linear regression) combined with efficient active set selection strategies.

Sparse Learning

Model-Based Reinforcement Learning with Value-Targeted Regression

no code implementations ICML 2020 Alex Ayoub, Zeyu Jia, Csaba Szepesvari, Mengdi Wang, Lin F. Yang

We propose a model based RL algorithm that is based on optimism principle: In each episode, the set of models that are `consistent' with the data collected is constructed.

Model-based Reinforcement Learning

Concept Annotation for Intelligent Textbooks

no code implementations22 May 2020 Mengdi Wang, Hung Chau, Khushboo Thaker, Peter Brusilovsky, Daqing He

The outcomes of our work include a validated knowledge engineering procedure, a code-book for technical concept annotation, and a set of concept annotations for the target textbook, which could be used as gold standard in further research.

Cautious Reinforcement Learning via Distributional Risk in the Dual Domain

no code implementations27 Feb 2020 Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel

To ameliorate this issue, we propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning.

Sketching Transformed Matrices with Applications to Natural Language Processing

no code implementations23 Feb 2020 Yingyu Liang, Zhao Song, Mengdi Wang, Lin F. Yang, Xin Yang

We show that our approach obtains small error and is efficient in both space and time.

Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

no code implementations ICML 2020 Yaqi Duan, Mengdi Wang

We prove that this method is information-theoretically optimal and has nearly minimal estimation error.

Continuous Control with Contexts, Provably

no code implementations30 Oct 2019 Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang

To our knowledge, this is first provably efficient algorithm to build a decoder in the continuous control setting.

Continuous Control

Characterizing Deep Learning Training Workloads on Alibaba-PAI

no code implementations14 Oct 2019 Mengdi Wang, Chen Meng, Guoping Long, Chuan Wu, Jun Yang, Wei. Lin, Yangqing Jia

One critical issue for efficiently operating practical AI clouds, is to characterize the computing and data transfer demands of these workloads, and more importantly, the training performance given the underlying software framework and hardware configurations.

Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

no code implementations29 Aug 2019 Aaron Sidford, Mengdi Wang, Lin F. Yang, Yinyu Ye

In this paper, we settle the sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors.

Q-Learning

Voting-Based Multi-Agent Reinforcement Learning for Intelligent IoT

no code implementations2 Jul 2019 Yue Xu, Zengde Deng, Mengdi Wang, Wenjun Xu, Anthony Man-Cho So, Shuguang Cui

The recent success of single-agent reinforcement learning (RL) in Internet of things (IoT) systems motivates the study of multi-agent reinforcement learning (MARL), which is more challenging but more useful in large-scale IoT.

Decision Making Multi-agent Reinforcement Learning

Learning Markov models via low-rank optimization

no code implementations28 Jun 2019 Ziwei Zhu, Xudong Li, Mengdi Wang, Anru Zhang

We show that one can estimate the full transition model accurately using a trajectory of length that is proportional to the total number of states.

Decision Making

Feature-Based Q-Learning for Two-Player Stochastic Games

no code implementations2 Jun 2019 Zeyu Jia, Lin F. Yang, Mengdi Wang

Consider a two-player zero-sum stochastic game where the transition function can be embedded in a given feature space.

Q-Learning

Learning low-dimensional state embeddings and metastable clusters from time series data

no code implementations NeurIPS 2019 Yifan Sun, Yaqi Duan, Hao Gong, Mengdi Wang

This paper studies how to find compact state embeddings from high-dimensional Markov state trajectories, where the transition kernel has a small intrinsic rank.

Time Series

RL4health: Crowdsourcing Reinforcement Learning for Knee Replacement Pathway Optimization

no code implementations24 May 2019 Hao Lu, Mengdi Wang

Joint replacement is the most common inpatient surgical treatment in the US.

Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound

no code implementations ICML 2020 Lin F. Yang, Mengdi Wang

In this case, the kernelized MatrixRL satisfies a regret bound ${O}\big(H^2\widetilde{d}\log T\sqrt{T}\big)$, where $\widetilde{d}$ is the effective dimension of the kernel space.

Learning to Control in Metric Space with Optimal Regret

1 code implementation5 May 2019 Lin F. Yang, Chengzhuo Ni, Mengdi Wang

We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces.

Sample-Optimal Parametric Q-Learning Using Linearly Additive Features

no code implementations13 Feb 2019 Lin F. Yang, Mengdi Wang

Consider a Markov decision process (MDP) that admits a set of state-action features, which can linearly express the process's probabilistic transition model.

Q-Learning

Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model

no code implementations NeurIPS 2018 Aaron Sidford, Mengdi Wang, Xian Wu, Lin Yang, Yinyu Ye

In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time.

Graph-Adaptive Pruning for Efficient Inference of Convolutional Neural Networks

no code implementations21 Nov 2018 Mengdi Wang, Qing Zhang, Jun Yang, Xiaoyuan Cui, Wei. Lin

In this method, the network is viewed as a computational graph, in which the vertices denote the computation nodes and edges represent the information flow.

Knowledge Distillation Model Compression

State Aggregation Learning from Markov Transition Data

no code implementations NeurIPS 2019 Yaqi Duan, Zheng Tracy Ke, Mengdi Wang

Our proposed method is a simple two-step algorithm: The first step is spectral decomposition of empirical transition matrix, and the second step conducts a linear transformation of singular vectors to find their approximate convex hull.

Adaptive Low-Nonnegative-Rank Approximation for State Aggregation of Markov Chains

no code implementations14 Oct 2018 Yaqi Duan, Mengdi Wang, Zaiwen Wen, Yaxiang Yuan

The efficiency and statistical properties of our approach are illustrated on synthetic data.

Diffusion Approximations for Online Principal Component Estimation and Global Convergence

no code implementations NeurIPS 2017 Chris Junchi Li, Mengdi Wang, Han Liu, Tong Zhang

In this paper, we propose to adopt the diffusion approximation tools to study the dynamics of Oja's iteration which is an online stochastic gradient descent method for the principal component analysis.

Scalable Bilinear Pi Learning Using State and Action Features

no code implementations ICML 2018 Yi-Chen Chen, Lihong Li, Mengdi Wang

In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided.

Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model

1 code implementation5 Jun 2018 Aaron Sidford, Mengdi Wang, Xian Wu, Lin F. Yang, Yinyu Ye

In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time.

Optimization and Control

Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient

1 code implementation1 Jun 2018 Tianyi Lin, Chenyou Fan, Mengdi Wang, Michael. I. Jordan

Convex composition optimization is an emerging topic that covers a wide range of applications arising from stochastic optimal control, reinforcement learning and multi-stage stochastic programming.

Scalable Bilinear $π$ Learning Using State and Action Features

no code implementations27 Apr 2018 Yi-Chen Chen, Lihong Li, Mengdi Wang

In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided.

Estimation of Markov Chain via Rank-Constrained Likelihood

no code implementations ICML 2018 Xudong Li, Mengdi Wang, Anru Zhang

This paper studies the estimation of low-rank Markov chains from empirical trajectories.

Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization

no code implementations NeurIPS 2018 Minshuo Chen, Lin Yang, Mengdi Wang, Tuo Zhao

Specifically, our goal is to estimate the principle component of time series data with respect to the covariance matrix of the stationary distribution.

Dimensionality Reduction Stochastic Optimization +1

Spectral State Compression of Markov Processes

no code implementations8 Feb 2018 Anru Zhang, Mengdi Wang

Model reduction of Markov processes is a basic problem in modeling state-transition systems.

Improved Oracle Complexity of Variance Reduced Methods for Nonsmooth Convex Stochastic Composition Optimization

no code implementations7 Feb 2018 Tianyi Lin, Chenyou Fan, Mengdi Wang

We consider the nonsmooth convex composition optimization problem where the objective is a composition of two finite-sum functions and analyze stochastic compositional variance reduced gradient (SCVRG) methods for them.

Deep Primal-Dual Reinforcement Learning: Accelerating Actor-Critic using Bellman Duality

no code implementations7 Dec 2017 Woon Sang Cho, Mengdi Wang

We believe that the primal-dual updates to the value and policy functions would expedite the learning process.

Q-Learning

Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes

1 code implementation27 Oct 2017 Aaron Sidford, Mengdi Wang, Xian Wu, Yinyu Ye

Given a discounted Markov Decision Process (DMDP) with $|S|$ states, $|A|$ actions, discount factor $\gamma\in(0, 1)$, and rewards in the range $[-M, M]$, we show how to compute an $\epsilon$-optimal policy, with probability $1 - \delta$ in time \[ \tilde{O}\left( \left(|S|^2 |A| + \frac{|S| |A|}{(1 - \gamma)^3} \right) \log\left( \frac{M}{\epsilon} \right) \log\left( \frac{1}{\delta} \right) \right) ~ .

Primal-Dual $π$ Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems

no code implementations17 Oct 2017 Mengdi Wang

Consider the problem of approximating the optimal policy of a Markov decision process (MDP) by sampling state transitions.

Online Factorization and Partition of Complex Networks From Random Walks

no code implementations22 May 2017 Lin F. Yang, Vladimir Braverman, Tuo Zhao, Mengdi Wang

We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm.

Lower Bound On the Computational Complexity of Discounted Markov Decision Problems

no code implementations20 May 2017 Yi-Chen Chen, Mengdi Wang

We study the computational complexity of the infinite-horizon discounted-reward Markov Decision Problem (MDP) with a finite state space $|\mathcal{S}|$ and a finite action space $|\mathcal{A}|$.

Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning

no code implementations8 Dec 2016 Yi-Chen Chen, Mengdi Wang

We study the online estimation of the optimal policy of a Markov decision process (MDP).

Accelerating Stochastic Composition Optimization

no code implementations NeurIPS 2016 Mengdi Wang, Ji Liu, Ethan X. Fang

The ASC-PG is the first proximal gradient method for the stochastic composition problem that can deal with nonsmooth regularization penalty.

Near-Optimal Stochastic Approximation for Online Principal Component Estimation

no code implementations16 Mar 2016 Chris Junchi Li, Mengdi Wang, Han Liu, Tong Zhang

We prove for the first time a nearly optimal finite-sample error bound for the online PCA algorithm.

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Compositions of Expected-Value Functions

no code implementations14 Nov 2014 Mengdi Wang, Ethan X. Fang, Han Liu

For smooth convex problems, the SCGD can be accelerated to converge at a rate of $O(k^{-2/7})$ in the general case and $O(k^{-4/5})$ in the strongly convex case.

Cannot find the paper you are looking for? You can Submit a new open access paper.