no code implementations • 1 Dec 2022 • Jinghan Wang, Mengdi Wang, Lin F. Yang
This work considers the sample complexity of obtaining an $\varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP), given access to a generative model (simulator).
no code implementations • 2 Nov 2022 • Le Xie, Tong Huang, Xiangtian Zheng, Yan Liu, Mengdi Wang, Vijay Vittal, P. R. Kumar, Srinivas Shakkottai, Yi Cui
The transition towards carbon-neutral electricity is one of the biggest game changers in addressing climate change since it addresses the dual challenges of removing carbon emissions from the two largest sectors of emitters: electricity and transportation.
no code implementations • 30 Oct 2022 • Chengzhuo Ni, Yuda Song, Xuezhou Zhang, Chi Jin, Mengdi Wang
To our best knowledge, this is the first sample-efficient algorithm for multi-agent general-sum Markov games that incorporates (non-linear) function approximation.
no code implementations • 3 Oct 2022 • Ming Yin, Mengdi Wang, Yu-Xiang Wang
Offline reinforcement learning, which aims at optimizing sequential decision-making strategies with historical data, has been extensively applied in real-life applications.
no code implementations • 29 Jun 2022 • Kaixuan Huang, Yu Wu, Xuezhou Zhang, Shenyinying Tu, Qingyun Wu, Mengdi Wang, Huazheng Wang
Online influence maximization aims to maximize the influence spread of a content in a social network with unknown network model by selecting a few seed nodes.
Model-based Reinforcement Learning
reinforcement-learning
+1
no code implementations • 22 Jun 2022 • Shuoguang Yang, Xuezhou Zhang, Mengdi Wang
This paper studies the problem of distributed bilevel optimization over a network where agents can only communicate with neighbors, including examples from multi-task, multi-agent learning and federated learning.
no code implementations • 10 Jun 2022 • Chuanhao Li, Huazheng Wang, Mengdi Wang, Hongning Wang
We tackle the communication efficiency challenge of learning kernelized contextual bandits in a distributed setting.
no code implementations • 10 Jun 2022 • Ming Yin, Wenjing Chen, Mengdi Wang, Yu-Xiang Wang
Goal-oriented Reinforcement Learning, where the agent needs to reach the goal state while simultaneously minimizing the cost, has received significant attention in real-world applications.
no code implementations • 6 Jun 2022 • Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao
We consider the off-policy evaluation problem of reinforcement learning using deep convolutional neural networks.
no code implementations • 5 Jun 2022 • Hui Yuan, Chengzhuo Ni, Huazheng Wang, Xuezhou Zhang, Le Cong, Csaba Szepesvári, Mengdi Wang
We propose a Thompson Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization, where the sequence-to-function mapping is unknown and querying a single value is subject to costly and noisy measurements.
no code implementations • 1 Jun 2022 • Yiding Chen, Xuezhou Zhang, Kaiqing Zhang, Mengdi Wang, Xiaojin Zhu
We consider a distributed reinforcement learning setting where multiple agents separately explore the environment and communicate their experiences through a central server.
1 code implementation • 29 May 2022 • Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang
We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation, which is subsequently used to learn a good policy in a target task.
2 code implementations • 23 May 2022 • Yuchao Li, Fuli Luo, Chuanqi Tan, Mengdi Wang, Songfang Huang, Shen Li, Junjie Bai
With the dramatically increased number of parameters in language models, sparsity methods have received ever-increasing research focus to compress and accelerate the models.
no code implementations • 11 Mar 2022 • Ming Yin, Yaqi Duan, Mengdi Wang, Yu-Xiang Wang
However, a precise understanding of the statistical limits with function representations, remains elusive, even when such a representation is linear.
no code implementations • 10 Feb 2022 • Ruiqi Zhang, Xuezhou Zhang, Chengzhuo Ni, Mengdi Wang
We approach this problem using the Z-estimation theory and establish the following results: The FQE estimation error is asymptotically normal with explicit variance determined jointly by the tangent space of the function class at the ground truth, the reward structure, and the distribution shift due to off-policy learning; The finite-sample FQE error bound is dominated by the same variance term, and it can also be bounded by function class-dependent divergence, which measures how the off-policy distribution shift intertwines with the function approximator.
no code implementations • 31 Jan 2022 • Chengzhuo Ni, Ruiqi Zhang, Xiang Ji, Xuezhou Zhang, Mengdi Wang
Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy.
1 code implementation • 31 Jan 2022 • Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun
We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i. e., Block MDPs), where rich observations are generated from a set of unknown latent states.
no code implementations • 29 Sep 2021 • Yu Wu, Joseph Chahn Kim, Chengzhuo Ni, Le Cong, Mengdi Wang
Genetic barcoding coupled with single-cell sequencing technology enables direct measurement of cell-to-cell transitions and gene-expression evolution over a long timespan.
no code implementations • 24 Sep 2021 • Yaqi Duan, Mengdi Wang, Martin J. Wainwright
Whereas existing worst-case theory predicts cubic scaling ($H^3$) in the effective horizon, our theory reveals that there is in fact a much wider range of scalings, depending on the kernel, the stationary distribution, and the variance of the Bellman residual error.
no code implementations • 16 Jul 2021 • Jiandong Mu, Mengdi Wang, Feiwen Zhu, Jun Yang, Wei Lin, Wei zhang
Reinforcement learning (RL)-based auto-pruning has been further proposed to automate the DNN pruning process to avoid expensive hand-crafted work.
no code implementations • 15 Jun 2021 • Amrit Singh Bedi, Anjaly Parayil, Junyu Zhang, Mengdi Wang, Alec Koppel
To close this gap, we step towards persistent exploration in continuous space through policy parameterizations defined by distributions of heavier tails defined by tail-index parameter alpha, which increases the likelihood of jumping in state space.
1 code implementation • 4 Jun 2021 • Shaokun Zhang, Xiawu Zheng, Chenyi Yang, Yuchao Li, Yan Wang, Fei Chao, Mengdi Wang, Shen Li, Jun Yang, Rongrong Ji
Motivated by the necessity of efficient inference across various constraints on BERT, we propose a novel approach, YOCO-BERT, to achieve compress once and deploy everywhere.
1 code implementation • 31 May 2021 • Mingbao Lin, Yuxin Zhang, Yuchao Li, Bohong Chen, Fei Chao, Mengdi Wang, Shen Li, Yonghong Tian, Rongrong Ji
We also provide a workflow of filter rearrangement that first rearranges the weight matrix in the output channel dimension to derive more influential blocks for accuracy improvements and then applies similar rearrangement to the next-layer weights in the input channel dimension to ensure correct convolutional operations.
no code implementations • 29 May 2021 • Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel
DSAC augments the classic critic step by requiring agents to (i) estimate their local occupancy measure in order to (ii) estimate the derivative of the local utility with respect to their occupancy measure, i. e., the "shadow reward".
1 code implementation • CVPR 2021 • Yuchao Li, Shaohui Lin, Jianzhuang Liu, Qixiang Ye, Mengdi Wang, Fei Chao, Fan Yang, Jincheng Ma, Qi Tian, Rongrong Ji
Channel pruning and tensor decomposition have received extensive attention in convolutional neural network compression.
no code implementations • 3 May 2021 • Chengzhuo Ni, Anru Zhang, Yaqi Duan, Mengdi Wang
The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure.
no code implementations • NeurIPS 2021 • Junyu Zhang, Chengzhuo Ni, Zheng Yu, Csaba Szepesvari, Mengdi Wang
By assuming the overparameterizaiton of policy and exploiting the hidden convexity of the problem, we further show that TSIVR-PG converges to global $\epsilon$-optimal policy with $\tilde{\mathcal{O}}(\epsilon^{-2})$ samples.
no code implementations • 6 Feb 2021 • Botao Hao, Xiang Ji, Yaqi Duan, Hao Lu, Csaba Szepesvári, Mengdi Wang
Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical property is less understood.
no code implementations • NeurIPS 2020 • Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael Jordan
Reinforcement learning (RL) algorithms combined with modern function approximators such as kernel functions and deep neural networks have achieved significant empirical successes in large-scale application problems with a massive number of states.
no code implementations • 9 Nov 2020 • Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael I. Jordan
The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions.
no code implementations • 8 Nov 2020 • Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang
First, we provide a lower bound showing that linear regret is generally unavoidable in this case, even if there exists a policy that collects well-conditioned data.
no code implementations • NeurIPS 2020 • Botao Hao, Tor Lattimore, Mengdi Wang
Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising.
no code implementations • 8 Nov 2020 • Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, Mengdi Wang
To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension.
no code implementations • NeurIPS 2020 • Jason D. Lee, Ruoqi Shen, Zhao Song, Mengdi Wang, Zheng Yu
Leverage score sampling is a powerful technique that originates from theoretical computer science, which can be used to speed up a large number of fundamental questions, e. g. linear regression, linear programming, semi-definite programming, cutting plane method, graph sparsification, maximum matching and max-flow.
no code implementations • NeurIPS 2020 • Junyu Zhang, Alec Koppel, Amrit Singh Bedi, Csaba Szepesvari, Mengdi Wang
Analogously to the Policy Gradient Theorem \cite{sutton2000policy} available for RL with cumulative rewards, we derive a new Variational Policy Gradient Theorem for RL with general utilities, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function.
1 code implementation • 27 Jun 2020 • Jason Ge, Xingguo Li, Haoming Jiang, Han Liu, Tong Zhang, Mengdi Wang, Tuo Zhao
We describe a new library named picasso, which implements a unified framework of pathwise coordinate optimization for a variety of sparse learning problems (e. g., sparse linear regression, sparse logistic regression, sparse Poisson regression and scaled sparse linear regression) combined with efficient active set selection strategies.
no code implementations • L4DC 2020 • Hao Gong, Mengdi Wang
In light of the Bellman duality, we propose a novel value-policy gradient algorithm to explore and act in infinite-horizon Average-reward Markov Decision Process (AMDP) and show that it has sublinear regret.
no code implementations • ICML 2020 • Alex Ayoub, Zeyu Jia, Csaba Szepesvari, Mengdi Wang, Lin F. Yang
We propose a model based RL algorithm that is based on optimism principle: In each episode, the set of models that are `consistent' with the data collected is constructed.
no code implementations • 22 May 2020 • Mengdi Wang, Hung Chau, Khushboo Thaker, Peter Brusilovsky, Daqing He
The outcomes of our work include a validated knowledge engineering procedure, a code-book for technical concept annotation, and a set of concept annotations for the target textbook, which could be used as gold standard in further research.
no code implementations • 27 Feb 2020 • Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel
To ameliorate this issue, we propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning.
no code implementations • 23 Feb 2020 • Yingyu Liang, Zhao Song, Mengdi Wang, Lin F. Yang, Xin Yang
We show that our approach obtains small error and is efficient in both space and time.
no code implementations • ICML 2020 • Yaqi Duan, Mengdi Wang
We prove that this method is information-theoretically optimal and has nearly minimal estimation error.
1 code implementation • EACL 2021 • Woon Sang Cho, Yizhe Zhang, Sudha Rao, Asli Celikyilmaz, Chenyan Xiong, Jianfeng Gao, Mengdi Wang, Bill Dolan
In the SL stage, a single-document question generator is trained.
no code implementations • 30 Oct 2019 • Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang
To our knowledge, this is first provably efficient algorithm to build a decoder in the continuous control setting.
no code implementations • 14 Oct 2019 • Mengdi Wang, Chen Meng, Guoping Long, Chuan Wu, Jun Yang, Wei. Lin, Yangqing Jia
One critical issue for efficiently operating practical AI clouds, is to characterize the computing and data transfer demands of these workloads, and more importantly, the training performance given the underlying software framework and hardware configurations.
no code implementations • 25 Sep 2019 • Zeyu Jia, Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang
Modern complex sequential decision-making problem often both low-level policy and high-level planning.
no code implementations • 29 Aug 2019 • Aaron Sidford, Mengdi Wang, Lin F. Yang, Yinyu Ye
In this paper, we settle the sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors.
no code implementations • 2 Jul 2019 • Yue Xu, Zengde Deng, Mengdi Wang, Wenjun Xu, Anthony Man-Cho So, Shuguang Cui
The recent success of single-agent reinforcement learning (RL) in Internet of things (IoT) systems motivates the study of multi-agent reinforcement learning (MARL), which is more challenging but more useful in large-scale IoT.
no code implementations • 28 Jun 2019 • Ziwei Zhu, Xudong Li, Mengdi Wang, Anru Zhang
We show that one can estimate the full transition model accurately using a trajectory of length that is proportional to the total number of states.
no code implementations • 2 Jun 2019 • Zeyu Jia, Lin F. Yang, Mengdi Wang
Consider a two-player zero-sum stochastic game where the transition function can be embedded in a given feature space.
no code implementations • NeurIPS 2019 • Yifan Sun, Yaqi Duan, Hao Gong, Mengdi Wang
This paper studies how to find compact state embeddings from high-dimensional Markov state trajectories, where the transition kernel has a small intrinsic rank.
no code implementations • ICML 2020 • Lin F. Yang, Mengdi Wang
In this case, the kernelized MatrixRL satisfies a regret bound ${O}\big(H^2\widetilde{d}\log T\sqrt{T}\big)$, where $\widetilde{d}$ is the effective dimension of the kernel space.
no code implementations • 24 May 2019 • Hao Lu, Mengdi Wang
Joint replacement is the most common inpatient surgical treatment in the US.
1 code implementation • 5 May 2019 • Lin F. Yang, Chengzhuo Ni, Mengdi Wang
We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces.
no code implementations • 13 Feb 2019 • Lin F. Yang, Mengdi Wang
Consider a Markov decision process (MDP) that admits a set of state-action features, which can linearly express the process's probabilistic transition model.
no code implementations • NeurIPS 2018 • Aaron Sidford, Mengdi Wang, Xian Wu, Lin Yang, Yinyu Ye
In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time.
no code implementations • 21 Nov 2018 • Mengdi Wang, Qing Zhang, Jun Yang, Xiaoyuan Cui, Wei. Lin
In this method, the network is viewed as a computational graph, in which the vertices denote the computation nodes and edges represent the information flow.
no code implementations • NeurIPS 2019 • Yaqi Duan, Zheng Tracy Ke, Mengdi Wang
Our proposed method is a simple two-step algorithm: The first step is spectral decomposition of empirical transition matrix, and the second step conducts a linear transformation of singular vectors to find their approximate convex hull.
no code implementations • WS 2019 • Woon Sang Cho, Pengchuan Zhang, Yizhe Zhang, Xiujun Li, Michel Galley, Chris Brockett, Mengdi Wang, Jianfeng Gao
Generating coherent and cohesive long-form texts is a challenging task.
no code implementations • 14 Oct 2018 • Yaqi Duan, Mengdi Wang, Zaiwen Wen, Yaxiang Yuan
The efficiency and statistical properties of our approach are illustrated on synthetic data.
no code implementations • 27 Sep 2018 • Woon Sang Cho, Pengchuan Zhang, Yizhe Zhang, Xiujun Li, Mengdi Wang, Jianfeng Gao
Generating coherent and cohesive long-form texts is a challenging problem in natural language generation.
no code implementations • NeurIPS 2017 • Chris Junchi Li, Mengdi Wang, Han Liu, Tong Zhang
In this paper, we propose to adopt the diffusion approximation tools to study the dynamics of Oja's iteration which is an online stochastic gradient descent method for the principal component analysis.
no code implementations • ICML 2018 • Yi-Chen Chen, Lihong Li, Mengdi Wang
In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided.
1 code implementation • 5 Jun 2018 • Aaron Sidford, Mengdi Wang, Xian Wu, Lin F. Yang, Yinyu Ye
In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time.
Optimization and Control
1 code implementation • 1 Jun 2018 • Tianyi Lin, Chenyou Fan, Mengdi Wang, Michael. I. Jordan
Convex composition optimization is an emerging topic that covers a wide range of applications arising from stochastic optimal control, reinforcement learning and multi-stage stochastic programming.
no code implementations • 27 Apr 2018 • Yi-Chen Chen, Lihong Li, Mengdi Wang
In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided.
no code implementations • ICML 2018 • Xudong Li, Mengdi Wang, Anru Zhang
This paper studies the estimation of low-rank Markov chains from empirical trajectories.
no code implementations • NeurIPS 2018 • Minshuo Chen, Lin Yang, Mengdi Wang, Tuo Zhao
Specifically, our goal is to estimate the principle component of time series data with respect to the covariance matrix of the stationary distribution.
no code implementations • 26 Feb 2018 • Sham Kakade, Mengdi Wang, Lin F. Yang
There is a technical issue in the analysis that is not easily fixable.
no code implementations • 8 Feb 2018 • Anru Zhang, Mengdi Wang
Model reduction of Markov processes is a basic problem in modeling state-transition systems.
no code implementations • 7 Feb 2018 • Tianyi Lin, Chenyou Fan, Mengdi Wang
We consider the nonsmooth convex composition optimization problem where the objective is a composition of two finite-sum functions and analyze stochastic compositional variance reduced gradient (SCVRG) methods for them.
no code implementations • 7 Dec 2017 • Woon Sang Cho, Mengdi Wang
We believe that the primal-dual updates to the value and policy functions would expedite the learning process.
1 code implementation • 27 Oct 2017 • Aaron Sidford, Mengdi Wang, Xian Wu, Yinyu Ye
Given a discounted Markov Decision Process (DMDP) with $|S|$ states, $|A|$ actions, discount factor $\gamma\in(0, 1)$, and rewards in the range $[-M, M]$, we show how to compute an $\epsilon$-optimal policy, with probability $1 - \delta$ in time \[ \tilde{O}\left( \left(|S|^2 |A| + \frac{|S| |A|}{(1 - \gamma)^3} \right) \log\left( \frac{M}{\epsilon} \right) \log\left( \frac{1}{\delta} \right) \right) ~ .
no code implementations • 17 Oct 2017 • Mengdi Wang
Consider the problem of approximating the optimal policy of a Markov decision process (MDP) by sampling state transitions.
no code implementations • 22 May 2017 • Lin F. Yang, Vladimir Braverman, Tuo Zhao, Mengdi Wang
We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm.
no code implementations • 20 May 2017 • Yi-Chen Chen, Mengdi Wang
We study the computational complexity of the infinite-horizon discounted-reward Markov Decision Problem (MDP) with a finite state space $|\mathcal{S}|$ and a finite action space $|\mathcal{A}|$.
no code implementations • 8 Dec 2016 • Yi-Chen Chen, Mengdi Wang
We study the online estimation of the optimal policy of a Markov decision process (MDP).
no code implementations • NeurIPS 2016 • Mengdi Wang, Ji Liu, Ethan X. Fang
The ASC-PG is the first proximal gradient method for the stochastic composition problem that can deal with nonsmooth regularization penalty.
no code implementations • 16 Mar 2016 • Chris Junchi Li, Mengdi Wang, Han Liu, Tong Zhang
We prove for the first time a nearly optimal finite-sample error bound for the online PCA algorithm.
no code implementations • 12 Nov 2015 • Mengdi Wang, Yi-Chen Chen, Jialin Liu, Yuantao Gu
Consider convex optimization problems subject to a large number of constraints.
no code implementations • 14 Nov 2014 • Mengdi Wang, Ethan X. Fang, Han Liu
For smooth convex problems, the SCGD can be accelerated to converge at a rate of $O(k^{-2/7})$ in the general case and $O(k^{-4/5})$ in the strongly convex case.