Search Results for author: Mengdi Wang

Found 120 papers, 16 papers with code

Parameter-Efficient Sparsity for Large Language Models Fine-Tuning

2 code implementations23 May 2022 Yuchao Li, Fuli Luo, Chuanqi Tan, Mengdi Wang, Songfang Huang, Shen Li, Junjie Bai

With the dramatically increased number of parameters in language models, sparsity methods have received ever-increasing research focus to compress and accelerate the models.

Visual Adversarial Examples Jailbreak Aligned Large Language Models

1 code implementation22 Jun 2023 Xiangyu Qi, Kaixuan Huang, Ashwinee Panda, Peter Henderson, Mengdi Wang, Prateek Mittal

Recently, there has been a surge of interest in integrating vision into Large Language Models (LLMs), exemplified by Visual Language Models (VLMs) such as Flamingo and GPT-4.

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

1 code implementation27 Jun 2020 Jason Ge, Xingguo Li, Haoming Jiang, Han Liu, Tong Zhang, Mengdi Wang, Tuo Zhao

We describe a new library named picasso, which implements a unified framework of pathwise coordinate optimization for a variety of sparse learning problems (e. g., sparse linear regression, sparse logistic regression, sparse Poisson regression and scaled sparse linear regression) combined with efficient active set selection strategies.

regression Sparse Learning

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient

1 code implementation4 Jun 2021 Shaokun Zhang, Xiawu Zheng, Chenyi Yang, Yuchao Li, Yan Wang, Fei Chao, Mengdi Wang, Shen Li, Jun Yang, Rongrong Ji

Motivated by the necessity of efficient inference across various constraints on BERT, we propose a novel approach, YOCO-BERT, to achieve compress once and deploy everywhere.

AutoML Model Compression

1xN Pattern for Pruning Convolutional Neural Networks

1 code implementation31 May 2021 Mingbao Lin, Yuxin Zhang, Yuchao Li, Bohong Chen, Fei Chao, Mengdi Wang, Shen Li, Yonghong Tian, Rongrong Ji

We also provide a workflow of filter rearrangement that first rearranges the weight matrix in the output channel dimension to derive more influential blocks for accuracy improvements and then applies similar rearrangement to the next-layer weights in the input channel dimension to ensure correct convolutional operations.

Network Pruning

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach

1 code implementation31 Jan 2022 Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun

We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i. e., Block MDPs), where rich observations are generated from a set of unknown latent states.

reinforcement-learning Reinforcement Learning (RL) +1

Provable Benefits of Representational Transfer in Reinforcement Learning

1 code implementation29 May 2022 Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang

We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation, which is subsequently used to learn a good policy in a \emph{target task}.

reinforcement-learning Reinforcement Learning (RL) +1

Deep Reinforcement Learning for Cost-Effective Medical Diagnosis

1 code implementation20 Feb 2023 Zheng Yu, Yikuan Li, Joseph Kim, Kaixuan Huang, Yuan Luo, Mengdi Wang

In this work, we use reinforcement learning (RL) to find a dynamic policy that selects lab test panels sequentially based on previous observations, ensuring accurate testing at a low cost.

Anomaly Detection Medical Diagnosis +3

Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective

2 code implementations NeurIPS 2023 Zeyu Zhang, Yi Su, Hui Yuan, Yiran Wu, Rishab Balasubramanian, Qingyun Wu, Huazheng Wang, Mengdi Wang

Building upon this, we leverage offline RL techniques for off-policy LTR and propose the Click Model-Agnostic Unified Off-policy Learning to Rank (CUOLR) method, which could be easily applied to a wide range of click models.

Learning-To-Rank Offline RL +2

Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes

1 code implementation27 Oct 2017 Aaron Sidford, Mengdi Wang, Xian Wu, Yinyu Ye

Given a discounted Markov Decision Process (DMDP) with $|S|$ states, $|A|$ actions, discount factor $\gamma\in(0, 1)$, and rewards in the range $[-M, M]$, we show how to compute an $\epsilon$-optimal policy, with probability $1 - \delta$ in time \[ \tilde{O}\left( \left(|S|^2 |A| + \frac{|S| |A|}{(1 - \gamma)^3} \right) \log\left( \frac{M}{\epsilon} \right) \log\left( \frac{1}{\delta} \right) \right) ~ .

Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model

1 code implementation5 Jun 2018 Aaron Sidford, Mengdi Wang, Xian Wu, Lin F. Yang, Yinyu Ye

In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time.

Optimization and Control

Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient

1 code implementation1 Jun 2018 Tianyi Lin, Chenyou Fan, Mengdi Wang, Michael. I. Jordan

Convex composition optimization is an emerging topic that covers a wide range of applications arising from stochastic optimal control, reinforcement learning and multi-stage stochastic programming.

reinforcement-learning Reinforcement Learning (RL)

TurboSVM-FL: Boosting Federated Learning through SVM Aggregation for Lazy Clients

1 code implementation22 Jan 2024 Mengdi Wang, Anna Bodonhelyi, Efe Bozkir, Enkelejda Kasneci

Federated learning is a distributed collaborative machine learning paradigm that has gained strong momentum in recent years.

Federated Learning

Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization

no code implementations NeurIPS 2018 Minshuo Chen, Lin Yang, Mengdi Wang, Tuo Zhao

Specifically, our goal is to estimate the principle component of time series data with respect to the covariance matrix of the stationary distribution.

Dimensionality Reduction Stochastic Optimization +2

Scalable Bilinear $π$ Learning Using State and Action Features

no code implementations27 Apr 2018 Yi-Chen Chen, Lihong Li, Mengdi Wang

In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided.

Improved Oracle Complexity of Variance Reduced Methods for Nonsmooth Convex Stochastic Composition Optimization

no code implementations7 Feb 2018 Tianyi Lin, Chenyou Fan, Mengdi Wang

We consider the nonsmooth convex composition optimization problem where the objective is a composition of two finite-sum functions and analyze stochastic compositional variance reduced gradient (SCVRG) methods for them.

Estimation of Markov Chain via Rank-Constrained Likelihood

no code implementations ICML 2018 Xudong Li, Mengdi Wang, Anru Zhang

This paper studies the estimation of low-rank Markov chains from empirical trajectories.

Spectral State Compression of Markov Processes

no code implementations8 Feb 2018 Anru Zhang, Mengdi Wang

Model reduction of Markov processes is a basic problem in modeling state-transition systems.

Clustering

Online Factorization and Partition of Complex Networks From Random Walks

no code implementations22 May 2017 Lin F. Yang, Vladimir Braverman, Tuo Zhao, Mengdi Wang

We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm.

Clustering

Deep Primal-Dual Reinforcement Learning: Accelerating Actor-Critic using Bellman Duality

no code implementations7 Dec 2017 Woon Sang Cho, Mengdi Wang

We believe that the primal-dual updates to the value and policy functions would expedite the learning process.

Q-Learning reinforcement-learning +1

Primal-Dual $π$ Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems

no code implementations17 Oct 2017 Mengdi Wang

Consider the problem of approximating the optimal policy of a Markov decision process (MDP) by sampling state transitions.

Near-Optimal Stochastic Approximation for Online Principal Component Estimation

no code implementations16 Mar 2016 Chris Junchi Li, Mengdi Wang, Han Liu, Tong Zhang

We prove for the first time a nearly optimal finite-sample error bound for the online PCA algorithm.

Lower Bound On the Computational Complexity of Discounted Markov Decision Problems

no code implementations20 May 2017 Yi-Chen Chen, Mengdi Wang

We study the computational complexity of the infinite-horizon discounted-reward Markov Decision Problem (MDP) with a finite state space $|\mathcal{S}|$ and a finite action space $|\mathcal{A}|$.

Accelerating Stochastic Composition Optimization

no code implementations NeurIPS 2016 Mengdi Wang, Ji Liu, Ethan X. Fang

The ASC-PG is the first proximal gradient method for the stochastic composition problem that can deal with nonsmooth regularization penalty.

reinforcement-learning Reinforcement Learning (RL)

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Compositions of Expected-Value Functions

no code implementations14 Nov 2014 Mengdi Wang, Ethan X. Fang, Han Liu

For smooth convex problems, the SCGD can be accelerated to converge at a rate of $O(k^{-2/7})$ in the general case and $O(k^{-4/5})$ in the strongly convex case.

Diffusion Approximations for Online Principal Component Estimation and Global Convergence

no code implementations NeurIPS 2017 Chris Junchi Li, Mengdi Wang, Han Liu, Tong Zhang

In this paper, we propose to adopt the diffusion approximation tools to study the dynamics of Oja's iteration which is an online stochastic gradient descent method for the principal component analysis.

Adaptive Low-Nonnegative-Rank Approximation for State Aggregation of Markov Chains

no code implementations14 Oct 2018 Yaqi Duan, Mengdi Wang, Zaiwen Wen, Yaxiang Yuan

The efficiency and statistical properties of our approach are illustrated on synthetic data.

State Aggregation Learning from Markov Transition Data

no code implementations NeurIPS 2019 Yaqi Duan, Zheng Tracy Ke, Mengdi Wang

Our proposed method is a simple two-step algorithm: The first step is spectral decomposition of empirical transition matrix, and the second step conducts a linear transformation of singular vectors to find their approximate convex hull.

Graph-Adaptive Pruning for Efficient Inference of Convolutional Neural Networks

no code implementations21 Nov 2018 Mengdi Wang, Qing Zhang, Jun Yang, Xiaoyuan Cui, Wei. Lin

In this method, the network is viewed as a computational graph, in which the vertices denote the computation nodes and edges represent the information flow.

Knowledge Distillation Model Compression

Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model

no code implementations NeurIPS 2018 Aaron Sidford, Mengdi Wang, Xian Wu, Lin Yang, Yinyu Ye

In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time.

Scalable Bilinear Pi Learning Using State and Action Features

no code implementations ICML 2018 Yi-Chen Chen, Lihong Li, Mengdi Wang

In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided.

Sample-Optimal Parametric Q-Learning Using Linearly Additive Features

no code implementations13 Feb 2019 Lin F. Yang, Mengdi Wang

Consider a Markov decision process (MDP) that admits a set of state-action features, which can linearly express the process's probabilistic transition model.

Q-Learning

Learning to Control in Metric Space with Optimal Regret

1 code implementation5 May 2019 Lin F. Yang, Chengzhuo Ni, Mengdi Wang

We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces.

reinforcement-learning Reinforcement Learning (RL)

Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound

no code implementations ICML 2020 Lin F. Yang, Mengdi Wang

In this case, the kernelized MatrixRL satisfies a regret bound ${O}\big(H^2\widetilde{d}\log T\sqrt{T}\big)$, where $\widetilde{d}$ is the effective dimension of the kernel space.

reinforcement-learning Reinforcement Learning (RL)

Learning low-dimensional state embeddings and metastable clusters from time series data

no code implementations NeurIPS 2019 Yifan Sun, Yaqi Duan, Hao Gong, Mengdi Wang

This paper studies how to find compact state embeddings from high-dimensional Markov state trajectories, where the transition kernel has a small intrinsic rank.

Clustering Time Series +1

Feature-Based Q-Learning for Two-Player Stochastic Games

no code implementations2 Jun 2019 Zeyu Jia, Lin F. Yang, Mengdi Wang

Consider a two-player zero-sum stochastic game where the transition function can be embedded in a given feature space.

Q-Learning Vocal Bursts Valence Prediction

Learning Markov models via low-rank optimization

no code implementations28 Jun 2019 Ziwei Zhu, Xudong Li, Mengdi Wang, Anru Zhang

We show that one can estimate the full transition model accurately using a trajectory of length that is proportional to the total number of states.

Decision Making

Voting-Based Multi-Agent Reinforcement Learning for Intelligent IoT

no code implementations2 Jul 2019 Yue Xu, Zengde Deng, Mengdi Wang, Wenjun Xu, Anthony Man-Cho So, Shuguang Cui

The recent success of single-agent reinforcement learning (RL) in Internet of things (IoT) systems motivates the study of multi-agent reinforcement learning (MARL), which is more challenging but more useful in large-scale IoT.

Decision Making Multi-agent Reinforcement Learning +2

Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

no code implementations29 Aug 2019 Aaron Sidford, Mengdi Wang, Lin F. Yang, Yinyu Ye

In this paper, we settle the sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors.

Q-Learning

Characterizing Deep Learning Training Workloads on Alibaba-PAI

no code implementations14 Oct 2019 Mengdi Wang, Chen Meng, Guoping Long, Chuan Wu, Jun Yang, Wei. Lin, Yangqing Jia

One critical issue for efficiently operating practical AI clouds, is to characterize the computing and data transfer demands of these workloads, and more importantly, the training performance given the underlying software framework and hardware configurations.

Continuous Control with Contexts, Provably

no code implementations30 Oct 2019 Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang

To our knowledge, this is first provably efficient algorithm to build a decoder in the continuous control setting.

Continuous Control

Sketching Transformed Matrices with Applications to Natural Language Processing

no code implementations23 Feb 2020 Yingyu Liang, Zhao Song, Mengdi Wang, Lin F. Yang, Xin Yang

We show that our approach obtains small error and is efficient in both space and time.

Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation

no code implementations ICML 2020 Yaqi Duan, Mengdi Wang

We prove that this method is information-theoretically optimal and has nearly minimal estimation error.

Off-policy evaluation

Cautious Reinforcement Learning via Distributional Risk in the Dual Domain

no code implementations27 Feb 2020 Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel

To ameliorate this issue, we propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Concept Annotation for Intelligent Textbooks

no code implementations22 May 2020 Mengdi Wang, Hung Chau, Khushboo Thaker, Peter Brusilovsky, Daqing He

The outcomes of our work include a validated knowledge engineering procedure, a code-book for technical concept annotation, and a set of concept annotations for the target textbook, which could be used as gold standard in further research.

Model-Based Reinforcement Learning with Value-Targeted Regression

no code implementations ICML 2020 Alex Ayoub, Zeyu Jia, Csaba Szepesvari, Mengdi Wang, Lin F. Yang

We propose a model based RL algorithm that is based on optimism principle: In each episode, the set of models that are `consistent' with the data collected is constructed.

Model-based Reinforcement Learning regression +2

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

no code implementations NeurIPS 2020 Junyu Zhang, Alec Koppel, Amrit Singh Bedi, Csaba Szepesvari, Mengdi Wang

Analogously to the Policy Gradient Theorem \cite{sutton2000policy} available for RL with cumulative rewards, we derive a new Variational Policy Gradient Theorem for RL with general utilities, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function.

reinforcement-learning Reinforcement Learning (RL) +1

Generalized Leverage Score Sampling for Neural Networks

no code implementations NeurIPS 2020 Jason D. Lee, Ruoqi Shen, Zhao Song, Mengdi Wang, Zheng Yu

Leverage score sampling is a powerful technique that originates from theoretical computer science, which can be used to speed up a large number of fundamental questions, e. g. linear regression, linear programming, semi-definite programming, cutting plane method, graph sparsification, maximum matching and max-flow.

Learning Theory regression

On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces

no code implementations9 Nov 2020 Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael I. Jordan

The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions.

Reinforcement Learning (RL)

High-Dimensional Sparse Linear Bandits

no code implementations NeurIPS 2020 Botao Hao, Tor Lattimore, Mengdi Wang

Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising.

Vocal Bursts Intensity Prediction

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

no code implementations8 Nov 2020 Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension.

feature selection Model Selection +2

Online Sparse Reinforcement Learning

no code implementations8 Nov 2020 Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

First, we provide a lower bound showing that linear regret is generally unavoidable in this case, even if there exists a policy that collects well-conditioned data.

reinforcement-learning Reinforcement Learning (RL)

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

no code implementations NeurIPS 2020 Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael Jordan

Reinforcement learning (RL) algorithms combined with modern function approximators such as kernel functions and deep neural networks have achieved significant empirical successes in large-scale application problems with a massive number of states.

reinforcement-learning Reinforcement Learning (RL)

Bootstrapping Fitted Q-Evaluation for Off-Policy Inference

no code implementations6 Feb 2021 Botao Hao, Xiang Ji, Yaqi Duan, Hao Lu, Csaba Szepesvári, Mengdi Wang

Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical property is less understood.

Off-policy evaluation

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

no code implementations NeurIPS 2021 Junyu Zhang, Chengzhuo Ni, Zheng Yu, Csaba Szepesvari, Mengdi Wang

By assuming the overparameterizaiton of policy and exploiting the hidden convexity of the problem, we further show that TSIVR-PG converges to global $\epsilon$-optimal policy with $\tilde{\mathcal{O}}(\epsilon^{-2})$ samples.

Reinforcement Learning (RL)

Learning Good State and Action Representations via Tensor Decomposition

no code implementations3 May 2021 Chengzhuo Ni, Yaqi Duan, Munther Dahleh, Anru Zhang, Mengdi Wang

The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure.

Tensor Decomposition

MARL with General Utilities via Decentralized Shadow Reward Actor-Critic

no code implementations29 May 2021 Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel

DSAC augments the classic critic step by requiring agents to (i) estimate their local occupancy measure in order to (ii) estimate the derivative of the local utility with respect to their occupancy measure, i. e., the "shadow reward".

Multi-agent Reinforcement Learning

On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control

no code implementations15 Jun 2021 Amrit Singh Bedi, Anjaly Parayil, Junyu Zhang, Mengdi Wang, Alec Koppel

To close this gap, we step towards persistent exploration in continuous space through policy parameterizations defined by distributions of heavier tails defined by tail-index parameter alpha, which increases the likelihood of jumping in state space.

Continuous Control Decision Making

Boosting the Convergence of Reinforcement Learning-based Auto-pruning Using Historical Data

no code implementations16 Jul 2021 Jiandong Mu, Mengdi Wang, Feiwen Zhu, Jun Yang, Wei Lin, Wei zhang

Reinforcement learning (RL)-based auto-pruning has been further proposed to automate the DNN pruning process to avoid expensive hand-crafted work.

Neural Network Compression reinforcement-learning +2

Optimal policy evaluation using kernel-based temporal difference methods

no code implementations24 Sep 2021 Yaqi Duan, Mengdi Wang, Martin J. Wainwright

Whereas existing worst-case theory predicts cubic scaling ($H^3$) in the effective horizon, our theory reveals that there is in fact a much wider range of scalings, depending on the kernel, the stationary distribution, and the variance of the Bellman residual error.

Cell2State: Learning Cell State Representations From Barcoded Single-Cell Gene-Expression Transitions

no code implementations29 Sep 2021 Yu Wu, Joseph Chahn Kim, Chengzhuo Ni, Le Cong, Mengdi Wang

Genetic barcoding coupled with single-cell sequencing technology enables direct measurement of cell-to-cell transitions and gene-expression evolution over a long timespan.

Dimensionality Reduction

PROVABLY BENEFITS OF DEEP HIERARCHICAL RL

no code implementations25 Sep 2019 Zeyu Jia, Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang

Modern complex sequential decision-making problem often both low-level policy and high-level planning.

Decision Making Hierarchical Reinforcement Learning

A Duality Approach for Regret Minimization in Average-Award Ergodic Markov Decision Processes

no code implementations L4DC 2020 Hao Gong, Mengdi Wang

In light of the Bellman duality, we propose a novel value-policy gradient algorithm to explore and act in infinite-horizon Average-reward Markov Decision Process (AMDP) and show that it has sublinear regret.

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

no code implementations31 Jan 2022 Chengzhuo Ni, Ruiqi Zhang, Xiang Ji, Xuezhou Zhang, Mengdi Wang

Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy.

Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory

no code implementations10 Feb 2022 Ruiqi Zhang, Xuezhou Zhang, Chengzhuo Ni, Mengdi Wang

We approach this problem using the Z-estimation theory and establish the following results: The FQE estimation error is asymptotically normal with explicit variance determined jointly by the tangent space of the function class at the ground truth, the reward structure, and the distribution shift due to off-policy learning; The finite-sample FQE error bound is dominated by the same variance term, and it can also be bounded by function class-dependent divergence, which measures how the off-policy distribution shift intertwines with the function approximator.

Off-policy evaluation

Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism

no code implementations11 Mar 2022 Ming Yin, Yaqi Duan, Mengdi Wang, Yu-Xiang Wang

However, a precise understanding of the statistical limits with function representations, remains elusive, even when such a representation is linear.

Decision Making reinforcement-learning +1

Byzantine-Robust Online and Offline Distributed Reinforcement Learning

no code implementations1 Jun 2022 Yiding Chen, Xuezhou Zhang, Kaiqing Zhang, Mengdi Wang, Xiaojin Zhu

We consider a distributed reinforcement learning setting where multiple agents separately explore the environment and communicate their experiences through a central server.

reinforcement-learning Reinforcement Learning (RL)

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization

no code implementations5 Jun 2022 Hui Yuan, Chengzhuo Ni, Huazheng Wang, Xuezhou Zhang, Le Cong, Csaba Szepesvári, Mengdi Wang

We propose a Thompson Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization, where the sequence-to-function mapping is unknown and querying a single value is subject to costly and noisy measurements.

BIG-bench Machine Learning Evolutionary Algorithms +2

Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks

no code implementations6 Jun 2022 Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao

We consider the off-policy evaluation problem of reinforcement learning using deep convolutional neural networks.

Off-policy evaluation

Offline Stochastic Shortest Path: Learning, Evaluation and Towards Optimality

no code implementations10 Jun 2022 Ming Yin, Wenjing Chen, Mengdi Wang, Yu-Xiang Wang

Goal-oriented Reinforcement Learning, where the agent needs to reach the goal state while simultaneously minimizing the cost, has received significant attention in real-world applications.

Communication Efficient Distributed Learning for Kernelized Contextual Bandits

no code implementations10 Jun 2022 Chuanhao Li, Huazheng Wang, Mengdi Wang, Hongning Wang

We tackle the communication efficiency challenge of learning kernelized contextual bandits in a distributed setting.

Multi-Armed Bandits

Decentralized Gossip-Based Stochastic Bilevel Optimization over Communication Networks

no code implementations22 Jun 2022 Shuoguang Yang, Xuezhou Zhang, Mengdi Wang

This paper studies the problem of distributed bilevel optimization over a network where agents can only communicate with neighbors, including examples from multi-task, multi-agent learning and federated learning.

Bilevel Optimization Federated Learning +3

Provably Efficient Reinforcement Learning for Online Adaptive Influence Maximization

no code implementations29 Jun 2022 Kaixuan Huang, Yu Wu, Xuezhou Zhang, Shenyinying Tu, Qingyun Wu, Mengdi Wang, Huazheng Wang

Online influence maximization aims to maximize the influence spread of a content in a social network with unknown network model by selecting a few seed nodes.

Model-based Reinforcement Learning reinforcement-learning +1

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

no code implementations3 Oct 2022 Ming Yin, Mengdi Wang, Yu-Xiang Wang

Offline reinforcement learning, which aims at optimizing sequential decision-making strategies with historical data, has been extensively applied in real-life applications.

Decision Making Offline RL +3

Representation Learning for General-sum Low-rank Markov Games

no code implementations30 Oct 2022 Chengzhuo Ni, Yuda Song, Xuezhou Zhang, Chi Jin, Mengdi Wang

To our best knowledge, this is the first sample-efficient algorithm for multi-agent general-sum Markov games that incorporates (non-linear) function approximation.

Representation Learning

Energy System Digitization in the Era of AI: A Three-Layered Approach towards Carbon Neutrality

no code implementations2 Nov 2022 Le Xie, Tong Huang, Xiangtian Zheng, Yan Liu, Mengdi Wang, Vijay Vittal, P. R. Kumar, Srinivas Shakkottai, Yi Cui

The transition towards carbon-neutral electricity is one of the biggest game changers in addressing climate change since it addresses the dual challenges of removing carbon emissions from the two largest sectors of emitters: electricity and transportation.

Decision Making

Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP

no code implementations1 Dec 2022 Jinghan Wang, Mengdi Wang, Lin F. Yang

This work considers the sample complexity of obtaining an $\varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP), given access to a generative model (simulator).

Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data

no code implementations14 Feb 2023 Minshuo Chen, Kaixuan Huang, Tuo Zhao, Mengdi Wang

Furthermore, the generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.

ChipGPT: How far are we from natural language hardware design

no code implementations23 May 2023 Kaiyan Chang, Ying Wang, Haimeng Ren, Mengdi Wang, Shengwen Liang, Yinhe Han, Huawei Li, Xiaowei Li

As large language models (LLMs) like ChatGPT exhibited unprecedented machine intelligence, it also shows great performance in assisting hardware engineers to realize higher-efficiency logic design via natural language interaction.

Eye-tracked Virtual Reality: A Comprehensive Survey on Methods and Privacy Challenges

no code implementations23 May 2023 Efe Bozkir, Süleyman Özdel, Mengdi Wang, Brendan David-John, Hong Gao, Kevin Butler, Eakta Jain, Enkelejda Kasneci

Latest developments in computer hardware, sensor technologies, and artificial intelligence can make virtual reality (VR) and virtual spaces an important part of human everyday life.

Gaze Estimation Pupil Detection

Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism

no code implementations29 May 2023 Zihao Li, Zhuoran Yang, Mengdi Wang

In this paper, we study offline Reinforcement Learning with Human Feedback (RLHF) where we aim to learn the human's underlying reward and the MDP's optimal policy from a set of trajectories induced by human choices.

Decision Making Econometrics +2

Efficient Reinforcement Learning with Impaired Observability: Learning to Act with Delayed and Missing State Observations

no code implementations2 Jun 2023 Minshuo Chen, Jie Meng, Yu Bai, Yinyu Ye, H. Vincent Poor, Mengdi Wang

We present algorithms and establish near-optimal regret upper and lower bounds, of the form $\tilde{\mathcal{O}}(\sqrt{{\rm poly}(H) SAK})$, for RL in the delayed and missing observation settings.

Reinforcement Learning (RL)

Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP

no code implementations21 Jun 2023 Jiacheng Guo, Zihao Li, Huazheng Wang, Mengdi Wang, Zhuoran Yang, Xuezhou Zhang

In this paper, we study representation learning in partially observable Markov Decision Processes (POMDPs), where the agent learns a decoder function that maps a series of high-dimensional raw observations to a compact representation and uses it for more efficient exploration and planning.

Efficient Exploration Representation Learning

Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories

no code implementations26 Jun 2023 Zixuan Zhang, Minshuo Chen, Mengdi Wang, Wenjing Liao, Tuo Zhao

Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to the intrinsic data structures.

regression

Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks

no code implementations4 Jul 2023 Kaiqi Zhang, Zixuan Zhang, Minshuo Chen, Yuma Takeda, Mengdi Wang, Tuo Zhao, Yu-Xiang Wang

Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkable prediction performance in practice, which cannot be well explained by conventional wisdom.

Scaling In-Context Demonstrations with Structured Attention

no code implementations5 Jul 2023 Tianle Cai, Kaixuan Huang, Jason D. Lee, Mengdi Wang

However, their capabilities of in-context learning are limited by the model architecture: 1) the use of demonstrations is constrained by a maximum sentence length due to positional embeddings; 2) the quadratic complexity of attention hinders users from using more demonstrations efficiently; 3) LLMs are shown to be sensitive to the order of the demonstrations.

In-Context Learning Sentence

Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight

no code implementations6 Jul 2023 Jiacheng Guo, Minshuo Chen, Huan Wang, Caiming Xiong, Mengdi Wang, Yu Bai

This paper studies the sample-efficiency of learning in Partially Observable Markov Decision Processes (POMDPs), a challenging problem in reinforcement learning that is known to be exponentially hard in the worst-case.

Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks

no code implementations26 Jul 2023 Siyu Chen, Mengdi Wang, Zhuoran Yang

The goal of the leader is to find her optimal policy, which yields the optimal expected total return, by interacting with the follower and learning from data.

Decision Making LEMMA +1

PARL: A Unified Framework for Policy Alignment in Reinforcement Learning

no code implementations3 Aug 2023 Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Dinesh Manocha, Huazheng Wang, Mengdi Wang, Furong Huang

We present a novel unified bilevel optimization-based framework, \textsf{PARL}, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning using utility or preference-based feedback.

Bilevel Optimization Procedure Learning +2

Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds

no code implementations25 Sep 2023 Zhenghao Xu, Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao

As a result, by properly choosing the network size and hyperparameters, NPMD can find an $\epsilon$-optimal policy with $\widetilde{O}(\epsilon^{-\frac{d}{\alpha}-2})$ samples in expectation, where $\alpha\in(0, 1]$ indicates the smoothness of environment.

Policy Gradient Methods Reinforcement Learning (RL)

A 5' UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions

no code implementations5 Oct 2023 Yanyi Chu, Dan Yu, Yupeng Li, Kaixuan Huang, Yue Shen, Le Cong, Jason Zhang, Mengdi Wang

The model outperformed the best-known benchmark by up to 42% for predicting the Mean Ribosome Loading, and by up to 60% for predicting the Translation Efficiency and the mRNA Expression Level.

Language Modelling Translation

Federated Multi-Level Optimization over Decentralized Networks

no code implementations10 Oct 2023 Shuoguang Yang, Xuezhou Zhang, Mengdi Wang

Multi-level optimization has gained increasing attention in recent years, as it provides a powerful framework for solving complex optimization problems that arise in many fields, such as meta-learning, multi-player games, reinforcement learning, and nested composition optimization.

Distributed Optimization Meta-Learning +1

Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks

no code implementations16 Oct 2023 Zihao Li, Xiang Ji, Minshuo Chen, Mengdi Wang

In fact, human preference data are now used with classic reinforcement learning algorithms such as actor-critic methods, which involve evaluating an intermediate policy over a reward learned from human preference data with distribution shift, known as off-policy evaluation (OPE).

Off-policy evaluation reinforcement-learning

Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning? A Theoretical Perspective

no code implementations29 Nov 2023 Lei Zhao, Mengdi Wang, Yu Bai

Inverse Reinforcement Learning (IRL) -- the problem of learning reward functions from demonstrations of an \emph{expert policy} -- plays a critical role in developing intelligent systems.

Offline RL reinforcement-learning

Scalable Normalizing Flows Enable Boltzmann Generators for Macromolecules

no code implementations8 Jan 2024 Joseph C. Kim, David Bloore, Karan Kapoor, Jun Feng, Ming-Hong Hao, Mengdi Wang

We demonstrate that standard architectures and training strategies, such as maximum likelihood alone, fail while our novel architecture and multi-stage training strategy are able to model the conformational distributions of protein G and HP35.

Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization

no code implementations8 Jan 2024 Jiahao Qiu, Hui Yuan, Jinghong Zhang, Wentao Chen, Huazheng Wang, Mengdi Wang

To enhance the efficiency of such a process, we propose a tree search-based bandit learning method, which expands a tree starting from the initial sequence with the guidance of a bandit machine learning model.

Embedding Large Language Models into Extended Reality: Opportunities and Challenges for Inclusion, Engagement, and Privacy

no code implementations6 Feb 2024 Efe Bozkir, Süleyman Özdel, Ka Hei Carrie Lau, Mengdi Wang, Hong Gao, Enkelejda Kasneci

Lastly, we speculate that combining the information provided to LLM-powered environments by the users and the biometric data obtained through the sensors might lead to novel privacy invasions.

Prompt Engineering

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

no code implementations7 Feb 2024 Boyi Wei, Kaixuan Huang, Yangsibo Huang, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, Peter Henderson

We develop methods to identify critical regions that are vital for safety guardrails, and that are disentangled from utility-relevant regions at both the neuron and rank levels.

MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences

no code implementations14 Feb 2024 Souradip Chakraborty, Jiahao Qiu, Hui Yuan, Alec Koppel, Furong Huang, Dinesh Manocha, Amrit Singh Bedi, Mengdi Wang

Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.

Fairness reinforcement-learning

Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning

no code implementations16 Feb 2024 Zihao Li, Boyi Liu, Zhuoran Yang, Zhaoran Wang, Mengdi Wang

Designing algorithms for a constrained convex MDP faces several challenges, including (1) handling the large state space, (2) managing the exploration/exploitation tradeoff, and (3) solving the constrained optimization where the objective and the constraint are both nonlinear functions of the visitation measure.

reinforcement-learning

Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian Mixture Models

no code implementations3 Mar 2024 Yuchen Wu, Minshuo Chen, Zihao Li, Mengdi Wang, Yuting Wei

Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties.

Image Generation

Regularized DeepIV with Model Selection

no code implementations7 Mar 2024 Zihao Li, Hui Lan, Vasilis Syrgkanis, Mengdi Wang, Masatoshi Uehara

In this paper, we study nonparametric estimation of instrumental variable (IV) regressions.

Model Selection regression

Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework

no code implementations17 Mar 2024 Kaiyan Chang, Kun Wang, Nan Yang, Ying Wang, Dantong Jin, Wenlong Zhu, Zhirong Chen, Cangyuan Li, Hao Yan, Yunhao Zhou, Zhuoliang Zhao, Yuan Cheng, Yudong Pan, Yiqi Liu, Mengdi Wang, Shengwen Liang, Yinhe Han, Huawei Li, Xiaowei Li

Our 13B model (ChipGPT-FT) has a pass rate improvement compared with GPT-3. 5 in Verilog generation and outperforms in EDA script (i. e., SiliconCompiler) generation with only 200 EDA script data.

Data Augmentation

Offline Multitask Representation Learning for Reinforcement Learning

no code implementations18 Mar 2024 Haque Ishfaq, Thanh Nguyen-Tang, Songtao Feng, Raman Arora, Mengdi Wang, Ming Yin, Doina Precup

We study offline multitask representation learning in reinforcement learning (RL), where a learner is provided with an offline dataset from different tasks that share a common representation and is asked to learn the shared representation.

reinforcement-learning Reinforcement Learning (RL) +1

Unveil Conditional Diffusion Models with Classifier-free Guidance: A Sharp Statistical Theory

no code implementations18 Mar 2024 Hengyu Fu, Zhuoran Yang, Mengdi Wang, Minshuo Chen

Conditional diffusion models serve as the foundation of modern image synthesis and find extensive application in fields like computational biology and reinforcement learning.

Image Generation reinforcement-learning

Embodied LLM Agents Learn to Cooperate in Organized Teams

no code implementations19 Mar 2024 Xudong Guo, Kaixuan Huang, Jiale Liu, Wenhui Fan, Natalia Vélez, Qingyun Wu, Huazheng Wang, Thomas L. Griffiths, Mengdi Wang

Large Language Models (LLMs) have emerged as integral tools for reasoning, planning, and decision-making, drawing upon their extensive world knowledge and proficiency in language-related tasks.

Decision Making World Knowledge

Diffusion Model for Data-Driven Black-Box Optimization

no code implementations20 Mar 2024 Zihao Li, Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Yinyu Ye, Minshuo Chen, Mengdi Wang

In this paper, we focus on diffusion models, a powerful generative AI technology, and investigate their potential for black-box optimization over complex structured variables.

An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization

no code implementations11 Apr 2024 Minshuo Chen, Song Mei, Jianqing Fan, Mengdi Wang

In this paper, we review emerging applications of diffusion models, understanding their sample generation under various controls.

Cannot find the paper you are looking for? You can Submit a new open access paper.