Search Results for author: Mengdi Wang

Found 81 papers, 12 papers with code

Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP

no code implementations1 Dec 2022 Jinghan Wang, Mengdi Wang, Lin F. Yang

This work considers the sample complexity of obtaining an $\varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP), given access to a generative model (simulator).

Energy System Digitization in the Era of AI: A Three-Layered Approach towards Carbon Neutrality

no code implementations2 Nov 2022 Le Xie, Tong Huang, Xiangtian Zheng, Yan Liu, Mengdi Wang, Vijay Vittal, P. R. Kumar, Srinivas Shakkottai, Yi Cui

The transition towards carbon-neutral electricity is one of the biggest game changers in addressing climate change since it addresses the dual challenges of removing carbon emissions from the two largest sectors of emitters: electricity and transportation.

Decision Making

Representation Learning for General-sum Low-rank Markov Games

no code implementations30 Oct 2022 Chengzhuo Ni, Yuda Song, Xuezhou Zhang, Chi Jin, Mengdi Wang

To our best knowledge, this is the first sample-efficient algorithm for multi-agent general-sum Markov games that incorporates (non-linear) function approximation.

Representation Learning

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

no code implementations3 Oct 2022 Ming Yin, Mengdi Wang, Yu-Xiang Wang

Offline reinforcement learning, which aims at optimizing sequential decision-making strategies with historical data, has been extensively applied in real-life applications.

Decision Making Offline RL +3

Provably Efficient Reinforcement Learning for Online Adaptive Influence Maximization

no code implementations29 Jun 2022 Kaixuan Huang, Yu Wu, Xuezhou Zhang, Shenyinying Tu, Qingyun Wu, Mengdi Wang, Huazheng Wang

Online influence maximization aims to maximize the influence spread of a content in a social network with unknown network model by selecting a few seed nodes.

Model-based Reinforcement Learning reinforcement-learning +1

Decentralized Gossip-Based Stochastic Bilevel Optimization over Communication Networks

no code implementations22 Jun 2022 Shuoguang Yang, Xuezhou Zhang, Mengdi Wang

This paper studies the problem of distributed bilevel optimization over a network where agents can only communicate with neighbors, including examples from multi-task, multi-agent learning and federated learning.

Bilevel Optimization Federated Learning +3

Communication Efficient Distributed Learning for Kernelized Contextual Bandits

no code implementations10 Jun 2022 Chuanhao Li, Huazheng Wang, Mengdi Wang, Hongning Wang

We tackle the communication efficiency challenge of learning kernelized contextual bandits in a distributed setting.

Multi-Armed Bandits

Offline Stochastic Shortest Path: Learning, Evaluation and Towards Optimality

no code implementations10 Jun 2022 Ming Yin, Wenjing Chen, Mengdi Wang, Yu-Xiang Wang

Goal-oriented Reinforcement Learning, where the agent needs to reach the goal state while simultaneously minimizing the cost, has received significant attention in real-world applications.

Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks

no code implementations6 Jun 2022 Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao

We consider the off-policy evaluation problem of reinforcement learning using deep convolutional neural networks.

Off-policy evaluation

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization

no code implementations5 Jun 2022 Hui Yuan, Chengzhuo Ni, Huazheng Wang, Xuezhou Zhang, Le Cong, Csaba Szepesvári, Mengdi Wang

We propose a Thompson Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization, where the sequence-to-function mapping is unknown and querying a single value is subject to costly and noisy measurements.

BIG-bench Machine Learning Learning Theory +1

Byzantine-Robust Online and Offline Distributed Reinforcement Learning

no code implementations1 Jun 2022 Yiding Chen, Xuezhou Zhang, Kaiqing Zhang, Mengdi Wang, Xiaojin Zhu

We consider a distributed reinforcement learning setting where multiple agents separately explore the environment and communicate their experiences through a central server.

reinforcement-learning reinforcement Learning

Provable Benefits of Representational Transfer in Reinforcement Learning

1 code implementation29 May 2022 Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang

We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation, which is subsequently used to learn a good policy in a target task.

reinforcement-learning reinforcement Learning +1

Parameter-Efficient Sparsity for Large Language Models Fine-Tuning

2 code implementations23 May 2022 Yuchao Li, Fuli Luo, Chuanqi Tan, Mengdi Wang, Songfang Huang, Shen Li, Junjie Bai

With the dramatically increased number of parameters in language models, sparsity methods have received ever-increasing research focus to compress and accelerate the models.

Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism

no code implementations11 Mar 2022 Ming Yin, Yaqi Duan, Mengdi Wang, Yu-Xiang Wang

However, a precise understanding of the statistical limits with function representations, remains elusive, even when such a representation is linear.

Decision Making reinforcement-learning +1

Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory

no code implementations10 Feb 2022 Ruiqi Zhang, Xuezhou Zhang, Chengzhuo Ni, Mengdi Wang

We approach this problem using the Z-estimation theory and establish the following results: The FQE estimation error is asymptotically normal with explicit variance determined jointly by the tangent space of the function class at the ground truth, the reward structure, and the distribution shift due to off-policy learning; The finite-sample FQE error bound is dominated by the same variance term, and it can also be bounded by function class-dependent divergence, which measures how the off-policy distribution shift intertwines with the function approximator.

Off-policy evaluation

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

no code implementations31 Jan 2022 Chengzhuo Ni, Ruiqi Zhang, Xiang Ji, Xuezhou Zhang, Mengdi Wang

Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy.

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach

1 code implementation31 Jan 2022 Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun

We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i. e., Block MDPs), where rich observations are generated from a set of unknown latent states.

reinforcement-learning reinforcement Learning +1

Cell2State: Learning Cell State Representations From Barcoded Single-Cell Gene-Expression Transitions

no code implementations29 Sep 2021 Yu Wu, Joseph Chahn Kim, Chengzhuo Ni, Le Cong, Mengdi Wang

Genetic barcoding coupled with single-cell sequencing technology enables direct measurement of cell-to-cell transitions and gene-expression evolution over a long timespan.

Dimensionality Reduction

Optimal policy evaluation using kernel-based temporal difference methods

no code implementations24 Sep 2021 Yaqi Duan, Mengdi Wang, Martin J. Wainwright

Whereas existing worst-case theory predicts cubic scaling ($H^3$) in the effective horizon, our theory reveals that there is in fact a much wider range of scalings, depending on the kernel, the stationary distribution, and the variance of the Bellman residual error.

Boosting the Convergence of Reinforcement Learning-based Auto-pruning Using Historical Data

no code implementations16 Jul 2021 Jiandong Mu, Mengdi Wang, Feiwen Zhu, Jun Yang, Wei Lin, Wei zhang

Reinforcement learning (RL)-based auto-pruning has been further proposed to automate the DNN pruning process to avoid expensive hand-crafted work.

Neural Network Compression reinforcement-learning +2

On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control

no code implementations15 Jun 2021 Amrit Singh Bedi, Anjaly Parayil, Junyu Zhang, Mengdi Wang, Alec Koppel

To close this gap, we step towards persistent exploration in continuous space through policy parameterizations defined by distributions of heavier tails defined by tail-index parameter alpha, which increases the likelihood of jumping in state space.

Continuous Control Decision Making

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient

1 code implementation4 Jun 2021 Shaokun Zhang, Xiawu Zheng, Chenyi Yang, Yuchao Li, Yan Wang, Fei Chao, Mengdi Wang, Shen Li, Jun Yang, Rongrong Ji

Motivated by the necessity of efficient inference across various constraints on BERT, we propose a novel approach, YOCO-BERT, to achieve compress once and deploy everywhere.

AutoML Model Compression

1xN Pattern for Pruning Convolutional Neural Networks

1 code implementation31 May 2021 Mingbao Lin, Yuxin Zhang, Yuchao Li, Bohong Chen, Fei Chao, Mengdi Wang, Shen Li, Yonghong Tian, Rongrong Ji

We also provide a workflow of filter rearrangement that first rearranges the weight matrix in the output channel dimension to derive more influential blocks for accuracy improvements and then applies similar rearrangement to the next-layer weights in the input channel dimension to ensure correct convolutional operations.

Network Pruning

MARL with General Utilities via Decentralized Shadow Reward Actor-Critic

no code implementations29 May 2021 Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel

DSAC augments the classic critic step by requiring agents to (i) estimate their local occupancy measure in order to (ii) estimate the derivative of the local utility with respect to their occupancy measure, i. e., the "shadow reward".

Multi-agent Reinforcement Learning

Learning Good State and Action Representations via Tensor Decomposition

no code implementations3 May 2021 Chengzhuo Ni, Anru Zhang, Yaqi Duan, Mengdi Wang

The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure.

Tensor Decomposition

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

no code implementations NeurIPS 2021 Junyu Zhang, Chengzhuo Ni, Zheng Yu, Csaba Szepesvari, Mengdi Wang

By assuming the overparameterizaiton of policy and exploiting the hidden convexity of the problem, we further show that TSIVR-PG converges to global $\epsilon$-optimal policy with $\tilde{\mathcal{O}}(\epsilon^{-2})$ samples.

Bootstrapping Fitted Q-Evaluation for Off-Policy Inference

no code implementations6 Feb 2021 Botao Hao, Xiang Ji, Yaqi Duan, Hao Lu, Csaba Szepesvári, Mengdi Wang

Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical property is less understood.

Off-policy evaluation

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

no code implementations NeurIPS 2020 Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael Jordan

Reinforcement learning (RL) algorithms combined with modern function approximators such as kernel functions and deep neural networks have achieved significant empirical successes in large-scale application problems with a massive number of states.

reinforcement-learning reinforcement Learning

On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces

no code implementations9 Nov 2020 Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael I. Jordan

The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions.

reinforcement Learning

Online Sparse Reinforcement Learning

no code implementations8 Nov 2020 Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

First, we provide a lower bound showing that linear regret is generally unavoidable in this case, even if there exists a policy that collects well-conditioned data.

reinforcement-learning reinforcement Learning

High-Dimensional Sparse Linear Bandits

no code implementations NeurIPS 2020 Botao Hao, Tor Lattimore, Mengdi Wang

Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising.

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

no code implementations8 Nov 2020 Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension.

Model Selection reinforcement-learning +1

Generalized Leverage Score Sampling for Neural Networks

no code implementations NeurIPS 2020 Jason D. Lee, Ruoqi Shen, Zhao Song, Mengdi Wang, Zheng Yu

Leverage score sampling is a powerful technique that originates from theoretical computer science, which can be used to speed up a large number of fundamental questions, e. g. linear regression, linear programming, semi-definite programming, cutting plane method, graph sparsification, maximum matching and max-flow.

Learning Theory regression

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

no code implementations NeurIPS 2020 Junyu Zhang, Alec Koppel, Amrit Singh Bedi, Csaba Szepesvari, Mengdi Wang

Analogously to the Policy Gradient Theorem \cite{sutton2000policy} available for RL with cumulative rewards, we derive a new Variational Policy Gradient Theorem for RL with general utilities, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function.

reinforcement-learning reinforcement Learning +1

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

1 code implementation27 Jun 2020 Jason Ge, Xingguo Li, Haoming Jiang, Han Liu, Tong Zhang, Mengdi Wang, Tuo Zhao

We describe a new library named picasso, which implements a unified framework of pathwise coordinate optimization for a variety of sparse learning problems (e. g., sparse linear regression, sparse logistic regression, sparse Poisson regression and scaled sparse linear regression) combined with efficient active set selection strategies.

regression Sparse Learning

A Duality Approach for Regret Minimization in Average-Award Ergodic Markov Decision Processes

no code implementations L4DC 2020 Hao Gong, Mengdi Wang

In light of the Bellman duality, we propose a novel value-policy gradient algorithm to explore and act in infinite-horizon Average-reward Markov Decision Process (AMDP) and show that it has sublinear regret.

Model-Based Reinforcement Learning with Value-Targeted Regression

no code implementations ICML 2020 Alex Ayoub, Zeyu Jia, Csaba Szepesvari, Mengdi Wang, Lin F. Yang

We propose a model based RL algorithm that is based on optimism principle: In each episode, the set of models that are `consistent' with the data collected is constructed.

Model-based Reinforcement Learning regression +2

Concept Annotation for Intelligent Textbooks

no code implementations22 May 2020 Mengdi Wang, Hung Chau, Khushboo Thaker, Peter Brusilovsky, Daqing He

The outcomes of our work include a validated knowledge engineering procedure, a code-book for technical concept annotation, and a set of concept annotations for the target textbook, which could be used as gold standard in further research.

Cautious Reinforcement Learning via Distributional Risk in the Dual Domain

no code implementations27 Feb 2020 Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel

To ameliorate this issue, we propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning.

reinforcement-learning reinforcement Learning

Sketching Transformed Matrices with Applications to Natural Language Processing

no code implementations23 Feb 2020 Yingyu Liang, Zhao Song, Mengdi Wang, Lin F. Yang, Xin Yang

We show that our approach obtains small error and is efficient in both space and time.

Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

no code implementations ICML 2020 Yaqi Duan, Mengdi Wang

We prove that this method is information-theoretically optimal and has nearly minimal estimation error.

Off-policy evaluation

Continuous Control with Contexts, Provably

no code implementations30 Oct 2019 Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang

To our knowledge, this is first provably efficient algorithm to build a decoder in the continuous control setting.

Continuous Control

Characterizing Deep Learning Training Workloads on Alibaba-PAI

no code implementations14 Oct 2019 Mengdi Wang, Chen Meng, Guoping Long, Chuan Wu, Jun Yang, Wei. Lin, Yangqing Jia

One critical issue for efficiently operating practical AI clouds, is to characterize the computing and data transfer demands of these workloads, and more importantly, the training performance given the underlying software framework and hardware configurations.


no code implementations25 Sep 2019 Zeyu Jia, Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang

Modern complex sequential decision-making problem often both low-level policy and high-level planning.

Decision Making Hierarchical Reinforcement Learning

Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

no code implementations29 Aug 2019 Aaron Sidford, Mengdi Wang, Lin F. Yang, Yinyu Ye

In this paper, we settle the sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors.


Voting-Based Multi-Agent Reinforcement Learning for Intelligent IoT

no code implementations2 Jul 2019 Yue Xu, Zengde Deng, Mengdi Wang, Wenjun Xu, Anthony Man-Cho So, Shuguang Cui

The recent success of single-agent reinforcement learning (RL) in Internet of things (IoT) systems motivates the study of multi-agent reinforcement learning (MARL), which is more challenging but more useful in large-scale IoT.

Decision Making Multi-agent Reinforcement Learning +2

Learning Markov models via low-rank optimization

no code implementations28 Jun 2019 Ziwei Zhu, Xudong Li, Mengdi Wang, Anru Zhang

We show that one can estimate the full transition model accurately using a trajectory of length that is proportional to the total number of states.

Decision Making

Feature-Based Q-Learning for Two-Player Stochastic Games

no code implementations2 Jun 2019 Zeyu Jia, Lin F. Yang, Mengdi Wang

Consider a two-player zero-sum stochastic game where the transition function can be embedded in a given feature space.


Learning low-dimensional state embeddings and metastable clusters from time series data

no code implementations NeurIPS 2019 Yifan Sun, Yaqi Duan, Hao Gong, Mengdi Wang

This paper studies how to find compact state embeddings from high-dimensional Markov state trajectories, where the transition kernel has a small intrinsic rank.

Time Series

Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound

no code implementations ICML 2020 Lin F. Yang, Mengdi Wang

In this case, the kernelized MatrixRL satisfies a regret bound ${O}\big(H^2\widetilde{d}\log T\sqrt{T}\big)$, where $\widetilde{d}$ is the effective dimension of the kernel space.

reinforcement-learning reinforcement Learning

Learning to Control in Metric Space with Optimal Regret

1 code implementation5 May 2019 Lin F. Yang, Chengzhuo Ni, Mengdi Wang

We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces.

reinforcement-learning reinforcement Learning

Sample-Optimal Parametric Q-Learning Using Linearly Additive Features

no code implementations13 Feb 2019 Lin F. Yang, Mengdi Wang

Consider a Markov decision process (MDP) that admits a set of state-action features, which can linearly express the process's probabilistic transition model.


Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model

no code implementations NeurIPS 2018 Aaron Sidford, Mengdi Wang, Xian Wu, Lin Yang, Yinyu Ye

In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time.

Graph-Adaptive Pruning for Efficient Inference of Convolutional Neural Networks

no code implementations21 Nov 2018 Mengdi Wang, Qing Zhang, Jun Yang, Xiaoyuan Cui, Wei. Lin

In this method, the network is viewed as a computational graph, in which the vertices denote the computation nodes and edges represent the information flow.

Knowledge Distillation Model Compression

State Aggregation Learning from Markov Transition Data

no code implementations NeurIPS 2019 Yaqi Duan, Zheng Tracy Ke, Mengdi Wang

Our proposed method is a simple two-step algorithm: The first step is spectral decomposition of empirical transition matrix, and the second step conducts a linear transformation of singular vectors to find their approximate convex hull.

Adaptive Low-Nonnegative-Rank Approximation for State Aggregation of Markov Chains

no code implementations14 Oct 2018 Yaqi Duan, Mengdi Wang, Zaiwen Wen, Yaxiang Yuan

The efficiency and statistical properties of our approach are illustrated on synthetic data.

Diffusion Approximations for Online Principal Component Estimation and Global Convergence

no code implementations NeurIPS 2017 Chris Junchi Li, Mengdi Wang, Han Liu, Tong Zhang

In this paper, we propose to adopt the diffusion approximation tools to study the dynamics of Oja's iteration which is an online stochastic gradient descent method for the principal component analysis.

Scalable Bilinear Pi Learning Using State and Action Features

no code implementations ICML 2018 Yi-Chen Chen, Lihong Li, Mengdi Wang

In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided.

Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model

1 code implementation5 Jun 2018 Aaron Sidford, Mengdi Wang, Xian Wu, Lin F. Yang, Yinyu Ye

In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time.

Optimization and Control

Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient

1 code implementation1 Jun 2018 Tianyi Lin, Chenyou Fan, Mengdi Wang, Michael. I. Jordan

Convex composition optimization is an emerging topic that covers a wide range of applications arising from stochastic optimal control, reinforcement learning and multi-stage stochastic programming.

reinforcement-learning reinforcement Learning

Scalable Bilinear $π$ Learning Using State and Action Features

no code implementations27 Apr 2018 Yi-Chen Chen, Lihong Li, Mengdi Wang

In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided.

Estimation of Markov Chain via Rank-Constrained Likelihood

no code implementations ICML 2018 Xudong Li, Mengdi Wang, Anru Zhang

This paper studies the estimation of low-rank Markov chains from empirical trajectories.

Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization

no code implementations NeurIPS 2018 Minshuo Chen, Lin Yang, Mengdi Wang, Tuo Zhao

Specifically, our goal is to estimate the principle component of time series data with respect to the covariance matrix of the stationary distribution.

Dimensionality Reduction Stochastic Optimization +1

Spectral State Compression of Markov Processes

no code implementations8 Feb 2018 Anru Zhang, Mengdi Wang

Model reduction of Markov processes is a basic problem in modeling state-transition systems.

Improved Oracle Complexity of Variance Reduced Methods for Nonsmooth Convex Stochastic Composition Optimization

no code implementations7 Feb 2018 Tianyi Lin, Chenyou Fan, Mengdi Wang

We consider the nonsmooth convex composition optimization problem where the objective is a composition of two finite-sum functions and analyze stochastic compositional variance reduced gradient (SCVRG) methods for them.

Deep Primal-Dual Reinforcement Learning: Accelerating Actor-Critic using Bellman Duality

no code implementations7 Dec 2017 Woon Sang Cho, Mengdi Wang

We believe that the primal-dual updates to the value and policy functions would expedite the learning process.

Q-Learning reinforcement-learning +1

Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes

1 code implementation27 Oct 2017 Aaron Sidford, Mengdi Wang, Xian Wu, Yinyu Ye

Given a discounted Markov Decision Process (DMDP) with $|S|$ states, $|A|$ actions, discount factor $\gamma\in(0, 1)$, and rewards in the range $[-M, M]$, we show how to compute an $\epsilon$-optimal policy, with probability $1 - \delta$ in time \[ \tilde{O}\left( \left(|S|^2 |A| + \frac{|S| |A|}{(1 - \gamma)^3} \right) \log\left( \frac{M}{\epsilon} \right) \log\left( \frac{1}{\delta} \right) \right) ~ .

Primal-Dual $π$ Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems

no code implementations17 Oct 2017 Mengdi Wang

Consider the problem of approximating the optimal policy of a Markov decision process (MDP) by sampling state transitions.

Online Factorization and Partition of Complex Networks From Random Walks

no code implementations22 May 2017 Lin F. Yang, Vladimir Braverman, Tuo Zhao, Mengdi Wang

We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm.

Lower Bound On the Computational Complexity of Discounted Markov Decision Problems

no code implementations20 May 2017 Yi-Chen Chen, Mengdi Wang

We study the computational complexity of the infinite-horizon discounted-reward Markov Decision Problem (MDP) with a finite state space $|\mathcal{S}|$ and a finite action space $|\mathcal{A}|$.

Accelerating Stochastic Composition Optimization

no code implementations NeurIPS 2016 Mengdi Wang, Ji Liu, Ethan X. Fang

The ASC-PG is the first proximal gradient method for the stochastic composition problem that can deal with nonsmooth regularization penalty.

reinforcement-learning reinforcement Learning

Near-Optimal Stochastic Approximation for Online Principal Component Estimation

no code implementations16 Mar 2016 Chris Junchi Li, Mengdi Wang, Han Liu, Tong Zhang

We prove for the first time a nearly optimal finite-sample error bound for the online PCA algorithm.

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Compositions of Expected-Value Functions

no code implementations14 Nov 2014 Mengdi Wang, Ethan X. Fang, Han Liu

For smooth convex problems, the SCGD can be accelerated to converge at a rate of $O(k^{-2/7})$ in the general case and $O(k^{-4/5})$ in the strongly convex case.

Cannot find the paper you are looking for? You can Submit a new open access paper.