2 code implementations • 23 May 2022 • Yuchao Li, Fuli Luo, Chuanqi Tan, Mengdi Wang, Songfang Huang, Shen Li, Junjie Bai
With the dramatically increased number of parameters in language models, sparsity methods have received ever-increasing research focus to compress and accelerate the models.
1 code implementation • 22 Jun 2023 • Xiangyu Qi, Kaixuan Huang, Ashwinee Panda, Peter Henderson, Mengdi Wang, Prateek Mittal
Recently, there has been a surge of interest in integrating vision into Large Language Models (LLMs), exemplified by Visual Language Models (VLMs) such as Flamingo and GPT-4.
1 code implementation • 27 Jun 2020 • Jason Ge, Xingguo Li, Haoming Jiang, Han Liu, Tong Zhang, Mengdi Wang, Tuo Zhao
We describe a new library named picasso, which implements a unified framework of pathwise coordinate optimization for a variety of sparse learning problems (e. g., sparse linear regression, sparse logistic regression, sparse Poisson regression and scaled sparse linear regression) combined with efficient active set selection strategies.
1 code implementation • 4 Jun 2021 • Shaokun Zhang, Xiawu Zheng, Chenyi Yang, Yuchao Li, Yan Wang, Fei Chao, Mengdi Wang, Shen Li, Jun Yang, Rongrong Ji
Motivated by the necessity of efficient inference across various constraints on BERT, we propose a novel approach, YOCO-BERT, to achieve compress once and deploy everywhere.
1 code implementation • 31 May 2021 • Mingbao Lin, Yuxin Zhang, Yuchao Li, Bohong Chen, Fei Chao, Mengdi Wang, Shen Li, Yonghong Tian, Rongrong Ji
We also provide a workflow of filter rearrangement that first rearranges the weight matrix in the output channel dimension to derive more influential blocks for accuracy improvements and then applies similar rearrangement to the next-layer weights in the input channel dimension to ensure correct convolutional operations.
1 code implementation • 31 Jan 2022 • Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun
We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i. e., Block MDPs), where rich observations are generated from a set of unknown latent states.
1 code implementation • 29 May 2022 • Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang
We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation, which is subsequently used to learn a good policy in a \emph{target task}.
1 code implementation • EACL 2021 • Woon Sang Cho, Yizhe Zhang, Sudha Rao, Asli Celikyilmaz, Chenyan Xiong, Jianfeng Gao, Mengdi Wang, Bill Dolan
In the SL stage, a single-document question generator is trained.
1 code implementation • CVPR 2021 • Yuchao Li, Shaohui Lin, Jianzhuang Liu, Qixiang Ye, Mengdi Wang, Fei Chao, Fan Yang, Jincheng Ma, Qi Tian, Rongrong Ji
Channel pruning and tensor decomposition have received extensive attention in convolutional neural network compression.
1 code implementation • 20 Feb 2023 • Zheng Yu, Yikuan Li, Joseph Kim, Kaixuan Huang, Yuan Luo, Mengdi Wang
In this work, we use reinforcement learning (RL) to find a dynamic policy that selects lab test panels sequentially based on previous observations, ensuring accurate testing at a low cost.
2 code implementations • NeurIPS 2023 • Zeyu Zhang, Yi Su, Hui Yuan, Yiran Wu, Rishab Balasubramanian, Qingyun Wu, Huazheng Wang, Mengdi Wang
Building upon this, we leverage offline RL techniques for off-policy LTR and propose the Click Model-Agnostic Unified Off-policy Learning to Rank (CUOLR) method, which could be easily applied to a wide range of click models.
1 code implementation • 27 Oct 2017 • Aaron Sidford, Mengdi Wang, Xian Wu, Yinyu Ye
Given a discounted Markov Decision Process (DMDP) with $|S|$ states, $|A|$ actions, discount factor $\gamma\in(0, 1)$, and rewards in the range $[-M, M]$, we show how to compute an $\epsilon$-optimal policy, with probability $1 - \delta$ in time \[ \tilde{O}\left( \left(|S|^2 |A| + \frac{|S| |A|}{(1 - \gamma)^3} \right) \log\left( \frac{M}{\epsilon} \right) \log\left( \frac{1}{\delta} \right) \right) ~ .
1 code implementation • 5 Jun 2018 • Aaron Sidford, Mengdi Wang, Xian Wu, Lin F. Yang, Yinyu Ye
In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time.
Optimization and Control
1 code implementation • 1 Jun 2018 • Tianyi Lin, Chenyou Fan, Mengdi Wang, Michael. I. Jordan
Convex composition optimization is an emerging topic that covers a wide range of applications arising from stochastic optimal control, reinforcement learning and multi-stage stochastic programming.
1 code implementation • 22 Jan 2024 • Mengdi Wang, Anna Bodonhelyi, Efe Bozkir, Enkelejda Kasneci
Federated learning is a distributed collaborative machine learning paradigm that has gained strong momentum in recent years.
no code implementations • NeurIPS 2018 • Minshuo Chen, Lin Yang, Mengdi Wang, Tuo Zhao
Specifically, our goal is to estimate the principle component of time series data with respect to the covariance matrix of the stationary distribution.
no code implementations • 27 Apr 2018 • Yi-Chen Chen, Lihong Li, Mengdi Wang
In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided.
no code implementations • 26 Feb 2018 • Sham Kakade, Mengdi Wang, Lin F. Yang
There is a technical issue in the analysis that is not easily fixable.
no code implementations • 7 Feb 2018 • Tianyi Lin, Chenyou Fan, Mengdi Wang
We consider the nonsmooth convex composition optimization problem where the objective is a composition of two finite-sum functions and analyze stochastic compositional variance reduced gradient (SCVRG) methods for them.
no code implementations • ICML 2018 • Xudong Li, Mengdi Wang, Anru Zhang
This paper studies the estimation of low-rank Markov chains from empirical trajectories.
no code implementations • 8 Feb 2018 • Anru Zhang, Mengdi Wang
Model reduction of Markov processes is a basic problem in modeling state-transition systems.
no code implementations • 22 May 2017 • Lin F. Yang, Vladimir Braverman, Tuo Zhao, Mengdi Wang
We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm.
no code implementations • 7 Dec 2017 • Woon Sang Cho, Mengdi Wang
We believe that the primal-dual updates to the value and policy functions would expedite the learning process.
no code implementations • 17 Oct 2017 • Mengdi Wang
Consider the problem of approximating the optimal policy of a Markov decision process (MDP) by sampling state transitions.
no code implementations • 16 Mar 2016 • Chris Junchi Li, Mengdi Wang, Han Liu, Tong Zhang
We prove for the first time a nearly optimal finite-sample error bound for the online PCA algorithm.
no code implementations • 20 May 2017 • Yi-Chen Chen, Mengdi Wang
We study the computational complexity of the infinite-horizon discounted-reward Markov Decision Problem (MDP) with a finite state space $|\mathcal{S}|$ and a finite action space $|\mathcal{A}|$.
no code implementations • 8 Dec 2016 • Yi-Chen Chen, Mengdi Wang
We study the online estimation of the optimal policy of a Markov decision process (MDP).
no code implementations • NeurIPS 2016 • Mengdi Wang, Ji Liu, Ethan X. Fang
The ASC-PG is the first proximal gradient method for the stochastic composition problem that can deal with nonsmooth regularization penalty.
no code implementations • 12 Nov 2015 • Mengdi Wang, Yi-Chen Chen, Jialin Liu, Yuantao Gu
Consider convex optimization problems subject to a large number of constraints.
no code implementations • 14 Nov 2014 • Mengdi Wang, Ethan X. Fang, Han Liu
For smooth convex problems, the SCGD can be accelerated to converge at a rate of $O(k^{-2/7})$ in the general case and $O(k^{-4/5})$ in the strongly convex case.
no code implementations • NeurIPS 2017 • Chris Junchi Li, Mengdi Wang, Han Liu, Tong Zhang
In this paper, we propose to adopt the diffusion approximation tools to study the dynamics of Oja's iteration which is an online stochastic gradient descent method for the principal component analysis.
no code implementations • 14 Oct 2018 • Yaqi Duan, Mengdi Wang, Zaiwen Wen, Yaxiang Yuan
The efficiency and statistical properties of our approach are illustrated on synthetic data.
no code implementations • WS 2019 • Woon Sang Cho, Pengchuan Zhang, Yizhe Zhang, Xiujun Li, Michel Galley, Chris Brockett, Mengdi Wang, Jianfeng Gao
Generating coherent and cohesive long-form texts is a challenging task.
no code implementations • NeurIPS 2019 • Yaqi Duan, Zheng Tracy Ke, Mengdi Wang
Our proposed method is a simple two-step algorithm: The first step is spectral decomposition of empirical transition matrix, and the second step conducts a linear transformation of singular vectors to find their approximate convex hull.
no code implementations • 21 Nov 2018 • Mengdi Wang, Qing Zhang, Jun Yang, Xiaoyuan Cui, Wei. Lin
In this method, the network is viewed as a computational graph, in which the vertices denote the computation nodes and edges represent the information flow.
no code implementations • NeurIPS 2018 • Aaron Sidford, Mengdi Wang, Xian Wu, Lin Yang, Yinyu Ye
In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time.
no code implementations • ICML 2018 • Yi-Chen Chen, Lihong Li, Mengdi Wang
In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided.
no code implementations • 13 Feb 2019 • Lin F. Yang, Mengdi Wang
Consider a Markov decision process (MDP) that admits a set of state-action features, which can linearly express the process's probabilistic transition model.
1 code implementation • 5 May 2019 • Lin F. Yang, Chengzhuo Ni, Mengdi Wang
We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces.
no code implementations • ICML 2020 • Lin F. Yang, Mengdi Wang
In this case, the kernelized MatrixRL satisfies a regret bound ${O}\big(H^2\widetilde{d}\log T\sqrt{T}\big)$, where $\widetilde{d}$ is the effective dimension of the kernel space.
no code implementations • NeurIPS 2019 • Yifan Sun, Yaqi Duan, Hao Gong, Mengdi Wang
This paper studies how to find compact state embeddings from high-dimensional Markov state trajectories, where the transition kernel has a small intrinsic rank.
no code implementations • 2 Jun 2019 • Zeyu Jia, Lin F. Yang, Mengdi Wang
Consider a two-player zero-sum stochastic game where the transition function can be embedded in a given feature space.
no code implementations • 24 May 2019 • Hao Lu, Mengdi Wang
Joint replacement is the most common inpatient surgical treatment in the US.
no code implementations • 28 Jun 2019 • Ziwei Zhu, Xudong Li, Mengdi Wang, Anru Zhang
We show that one can estimate the full transition model accurately using a trajectory of length that is proportional to the total number of states.
no code implementations • 2 Jul 2019 • Yue Xu, Zengde Deng, Mengdi Wang, Wenjun Xu, Anthony Man-Cho So, Shuguang Cui
The recent success of single-agent reinforcement learning (RL) in Internet of things (IoT) systems motivates the study of multi-agent reinforcement learning (MARL), which is more challenging but more useful in large-scale IoT.
no code implementations • 29 Aug 2019 • Aaron Sidford, Mengdi Wang, Lin F. Yang, Yinyu Ye
In this paper, we settle the sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors.
no code implementations • 14 Oct 2019 • Mengdi Wang, Chen Meng, Guoping Long, Chuan Wu, Jun Yang, Wei. Lin, Yangqing Jia
One critical issue for efficiently operating practical AI clouds, is to characterize the computing and data transfer demands of these workloads, and more importantly, the training performance given the underlying software framework and hardware configurations.
no code implementations • 30 Oct 2019 • Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang
To our knowledge, this is first provably efficient algorithm to build a decoder in the continuous control setting.
no code implementations • 23 Feb 2020 • Yingyu Liang, Zhao Song, Mengdi Wang, Lin F. Yang, Xin Yang
We show that our approach obtains small error and is efficient in both space and time.
no code implementations • ICML 2020 • Yaqi Duan, Mengdi Wang
We prove that this method is information-theoretically optimal and has nearly minimal estimation error.
no code implementations • 27 Feb 2020 • Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel
To ameliorate this issue, we propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning.
no code implementations • 22 May 2020 • Mengdi Wang, Hung Chau, Khushboo Thaker, Peter Brusilovsky, Daqing He
The outcomes of our work include a validated knowledge engineering procedure, a code-book for technical concept annotation, and a set of concept annotations for the target textbook, which could be used as gold standard in further research.
no code implementations • ICML 2020 • Alex Ayoub, Zeyu Jia, Csaba Szepesvari, Mengdi Wang, Lin F. Yang
We propose a model based RL algorithm that is based on optimism principle: In each episode, the set of models that are `consistent' with the data collected is constructed.
no code implementations • NeurIPS 2020 • Junyu Zhang, Alec Koppel, Amrit Singh Bedi, Csaba Szepesvari, Mengdi Wang
Analogously to the Policy Gradient Theorem \cite{sutton2000policy} available for RL with cumulative rewards, we derive a new Variational Policy Gradient Theorem for RL with general utilities, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function.
no code implementations • NeurIPS 2020 • Jason D. Lee, Ruoqi Shen, Zhao Song, Mengdi Wang, Zheng Yu
Leverage score sampling is a powerful technique that originates from theoretical computer science, which can be used to speed up a large number of fundamental questions, e. g. linear regression, linear programming, semi-definite programming, cutting plane method, graph sparsification, maximum matching and max-flow.
no code implementations • 9 Nov 2020 • Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael I. Jordan
The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions.
no code implementations • NeurIPS 2020 • Botao Hao, Tor Lattimore, Mengdi Wang
Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising.
no code implementations • 8 Nov 2020 • Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, Mengdi Wang
To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension.
no code implementations • 8 Nov 2020 • Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang
First, we provide a lower bound showing that linear regret is generally unavoidable in this case, even if there exists a policy that collects well-conditioned data.
no code implementations • NeurIPS 2020 • Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael Jordan
Reinforcement learning (RL) algorithms combined with modern function approximators such as kernel functions and deep neural networks have achieved significant empirical successes in large-scale application problems with a massive number of states.
no code implementations • 6 Feb 2021 • Botao Hao, Xiang Ji, Yaqi Duan, Hao Lu, Csaba Szepesvári, Mengdi Wang
Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical property is less understood.
no code implementations • NeurIPS 2021 • Junyu Zhang, Chengzhuo Ni, Zheng Yu, Csaba Szepesvari, Mengdi Wang
By assuming the overparameterizaiton of policy and exploiting the hidden convexity of the problem, we further show that TSIVR-PG converges to global $\epsilon$-optimal policy with $\tilde{\mathcal{O}}(\epsilon^{-2})$ samples.
no code implementations • 3 May 2021 • Chengzhuo Ni, Yaqi Duan, Munther Dahleh, Anru Zhang, Mengdi Wang
The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure.
no code implementations • 29 May 2021 • Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel
DSAC augments the classic critic step by requiring agents to (i) estimate their local occupancy measure in order to (ii) estimate the derivative of the local utility with respect to their occupancy measure, i. e., the "shadow reward".
no code implementations • 15 Jun 2021 • Amrit Singh Bedi, Anjaly Parayil, Junyu Zhang, Mengdi Wang, Alec Koppel
To close this gap, we step towards persistent exploration in continuous space through policy parameterizations defined by distributions of heavier tails defined by tail-index parameter alpha, which increases the likelihood of jumping in state space.
no code implementations • 16 Jul 2021 • Jiandong Mu, Mengdi Wang, Feiwen Zhu, Jun Yang, Wei Lin, Wei zhang
Reinforcement learning (RL)-based auto-pruning has been further proposed to automate the DNN pruning process to avoid expensive hand-crafted work.
no code implementations • 24 Sep 2021 • Yaqi Duan, Mengdi Wang, Martin J. Wainwright
Whereas existing worst-case theory predicts cubic scaling ($H^3$) in the effective horizon, our theory reveals that there is in fact a much wider range of scalings, depending on the kernel, the stationary distribution, and the variance of the Bellman residual error.
no code implementations • 29 Sep 2021 • Yu Wu, Joseph Chahn Kim, Chengzhuo Ni, Le Cong, Mengdi Wang
Genetic barcoding coupled with single-cell sequencing technology enables direct measurement of cell-to-cell transitions and gene-expression evolution over a long timespan.
no code implementations • 27 Sep 2018 • Woon Sang Cho, Pengchuan Zhang, Yizhe Zhang, Xiujun Li, Mengdi Wang, Jianfeng Gao
Generating coherent and cohesive long-form texts is a challenging problem in natural language generation.
no code implementations • 25 Sep 2019 • Zeyu Jia, Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang
Modern complex sequential decision-making problem often both low-level policy and high-level planning.
no code implementations • L4DC 2020 • Hao Gong, Mengdi Wang
In light of the Bellman duality, we propose a novel value-policy gradient algorithm to explore and act in infinite-horizon Average-reward Markov Decision Process (AMDP) and show that it has sublinear regret.
no code implementations • 31 Jan 2022 • Chengzhuo Ni, Ruiqi Zhang, Xiang Ji, Xuezhou Zhang, Mengdi Wang
Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy.
no code implementations • 10 Feb 2022 • Ruiqi Zhang, Xuezhou Zhang, Chengzhuo Ni, Mengdi Wang
We approach this problem using the Z-estimation theory and establish the following results: The FQE estimation error is asymptotically normal with explicit variance determined jointly by the tangent space of the function class at the ground truth, the reward structure, and the distribution shift due to off-policy learning; The finite-sample FQE error bound is dominated by the same variance term, and it can also be bounded by function class-dependent divergence, which measures how the off-policy distribution shift intertwines with the function approximator.
no code implementations • 11 Mar 2022 • Ming Yin, Yaqi Duan, Mengdi Wang, Yu-Xiang Wang
However, a precise understanding of the statistical limits with function representations, remains elusive, even when such a representation is linear.
no code implementations • 1 Jun 2022 • Yiding Chen, Xuezhou Zhang, Kaiqing Zhang, Mengdi Wang, Xiaojin Zhu
We consider a distributed reinforcement learning setting where multiple agents separately explore the environment and communicate their experiences through a central server.
no code implementations • 5 Jun 2022 • Hui Yuan, Chengzhuo Ni, Huazheng Wang, Xuezhou Zhang, Le Cong, Csaba Szepesvári, Mengdi Wang
We propose a Thompson Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization, where the sequence-to-function mapping is unknown and querying a single value is subject to costly and noisy measurements.
no code implementations • 6 Jun 2022 • Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao
We consider the off-policy evaluation problem of reinforcement learning using deep convolutional neural networks.
no code implementations • 10 Jun 2022 • Ming Yin, Wenjing Chen, Mengdi Wang, Yu-Xiang Wang
Goal-oriented Reinforcement Learning, where the agent needs to reach the goal state while simultaneously minimizing the cost, has received significant attention in real-world applications.
no code implementations • 10 Jun 2022 • Chuanhao Li, Huazheng Wang, Mengdi Wang, Hongning Wang
We tackle the communication efficiency challenge of learning kernelized contextual bandits in a distributed setting.
no code implementations • 22 Jun 2022 • Shuoguang Yang, Xuezhou Zhang, Mengdi Wang
This paper studies the problem of distributed bilevel optimization over a network where agents can only communicate with neighbors, including examples from multi-task, multi-agent learning and federated learning.
no code implementations • 29 Jun 2022 • Kaixuan Huang, Yu Wu, Xuezhou Zhang, Shenyinying Tu, Qingyun Wu, Mengdi Wang, Huazheng Wang
Online influence maximization aims to maximize the influence spread of a content in a social network with unknown network model by selecting a few seed nodes.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 3 Oct 2022 • Ming Yin, Mengdi Wang, Yu-Xiang Wang
Offline reinforcement learning, which aims at optimizing sequential decision-making strategies with historical data, has been extensively applied in real-life applications.
no code implementations • 30 Oct 2022 • Chengzhuo Ni, Yuda Song, Xuezhou Zhang, Chi Jin, Mengdi Wang
To our best knowledge, this is the first sample-efficient algorithm for multi-agent general-sum Markov games that incorporates (non-linear) function approximation.
no code implementations • 2 Nov 2022 • Le Xie, Tong Huang, Xiangtian Zheng, Yan Liu, Mengdi Wang, Vijay Vittal, P. R. Kumar, Srinivas Shakkottai, Yi Cui
The transition towards carbon-neutral electricity is one of the biggest game changers in addressing climate change since it addresses the dual challenges of removing carbon emissions from the two largest sectors of emitters: electricity and transportation.
no code implementations • 1 Dec 2022 • Jinghan Wang, Mengdi Wang, Lin F. Yang
This work considers the sample complexity of obtaining an $\varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP), given access to a generative model (simulator).
no code implementations • 28 Jan 2023 • Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Mengdi Wang, Furong Huang, Dinesh Manocha
Directed Exploration is a crucial challenge in reinforcement learning (RL), especially when rewards are sparse.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 14 Feb 2023 • Minshuo Chen, Kaixuan Huang, Tuo Zhao, Mengdi Wang
Furthermore, the generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.
no code implementations • 23 May 2023 • Kaiyan Chang, Ying Wang, Haimeng Ren, Mengdi Wang, Shengwen Liang, Yinhe Han, Huawei Li, Xiaowei Li
As large language models (LLMs) like ChatGPT exhibited unprecedented machine intelligence, it also shows great performance in assisting hardware engineers to realize higher-efficiency logic design via natural language interaction.
no code implementations • 23 May 2023 • Efe Bozkir, Süleyman Özdel, Mengdi Wang, Brendan David-John, Hong Gao, Kevin Butler, Eakta Jain, Enkelejda Kasneci
Latest developments in computer hardware, sensor technologies, and artificial intelligence can make virtual reality (VR) and virtual spaces an important part of human everyday life.
no code implementations • 29 May 2023 • Zihao Li, Zhuoran Yang, Mengdi Wang
In this paper, we study offline Reinforcement Learning with Human Feedback (RLHF) where we aim to learn the human's underlying reward and the MDP's optimal policy from a set of trajectories induced by human choices.
no code implementations • 30 May 2023 • Zichen Wang, Rishab Balasubramanian, Hui Yuan, Chenyu Song, Mengdi Wang, Huazheng Wang
We propose the first study of adversarial attacks on online learning to rank.
no code implementations • 2 Jun 2023 • Minshuo Chen, Jie Meng, Yu Bai, Yinyu Ye, H. Vincent Poor, Mengdi Wang
We present algorithms and establish near-optimal regret upper and lower bounds, of the form $\tilde{\mathcal{O}}(\sqrt{{\rm poly}(H) SAK})$, for RL in the delayed and missing observation settings.
no code implementations • 21 Jun 2023 • Jiacheng Guo, Zihao Li, Huazheng Wang, Mengdi Wang, Zhuoran Yang, Xuezhou Zhang
In this paper, we study representation learning in partially observable Markov Decision Processes (POMDPs), where the agent learns a decoder function that maps a series of high-dimensional raw observations to a compact representation and uses it for more efficient exploration and planning.
no code implementations • 26 Jun 2023 • Zixuan Zhang, Minshuo Chen, Mengdi Wang, Wenjing Liao, Tuo Zhao
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to the intrinsic data structures.
no code implementations • 4 Jul 2023 • Kaiqi Zhang, Zixuan Zhang, Minshuo Chen, Yuma Takeda, Mengdi Wang, Tuo Zhao, Yu-Xiang Wang
Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkable prediction performance in practice, which cannot be well explained by conventional wisdom.
no code implementations • 5 Jul 2023 • Tianle Cai, Kaixuan Huang, Jason D. Lee, Mengdi Wang
However, their capabilities of in-context learning are limited by the model architecture: 1) the use of demonstrations is constrained by a maximum sentence length due to positional embeddings; 2) the quadratic complexity of attention hinders users from using more demonstrations efficiently; 3) LLMs are shown to be sensitive to the order of the demonstrations.
no code implementations • 6 Jul 2023 • Jiacheng Guo, Minshuo Chen, Huan Wang, Caiming Xiong, Mengdi Wang, Yu Bai
This paper studies the sample-efficiency of learning in Partially Observable Markov Decision Processes (POMDPs), a challenging problem in reinforcement learning that is known to be exponentially hard in the worst-case.
no code implementations • 24 Jul 2023 • Xiang Ji, Huazheng Wang, Minshuo Chen, Tuo Zhao, Mengdi Wang
A popular approach is to utilize human feedback to learn a reward function for training.
no code implementations • 26 Jul 2023 • Siyu Chen, Mengdi Wang, Zhuoran Yang
The goal of the leader is to find her optimal policy, which yields the optimal expected total return, by interacting with the follower and learning from data.
no code implementations • 3 Aug 2023 • Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Dinesh Manocha, Huazheng Wang, Mengdi Wang, Furong Huang
We present a novel unified bilevel optimization-based framework, \textsf{PARL}, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning using utility or preference-based feedback.
no code implementations • 15 Sep 2023 • Yikuan Li, Chengsheng Mao, Kaixuan Huang, Hanyin Wang, Zheng Yu, Mengdi Wang, Yuan Luo
Scarcity of health care resources could result in the unavoidable consequence of rationing.
no code implementations • 25 Sep 2023 • Zhenghao Xu, Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao
As a result, by properly choosing the network size and hyperparameters, NPMD can find an $\epsilon$-optimal policy with $\widetilde{O}(\epsilon^{-\frac{d}{\alpha}-2})$ samples in expectation, where $\alpha\in(0, 1]$ indicates the smoothness of environment.
no code implementations • 5 Oct 2023 • Yanyi Chu, Dan Yu, Yupeng Li, Kaixuan Huang, Yue Shen, Le Cong, Jason Zhang, Mengdi Wang
The model outperformed the best-known benchmark by up to 42% for predicting the Mean Ribosome Loading, and by up to 60% for predicting the Translation Efficiency and the mRNA Expression Level.
no code implementations • 10 Oct 2023 • Shuoguang Yang, Xuezhou Zhang, Mengdi Wang
Multi-level optimization has gained increasing attention in recent years, as it provides a powerful framework for solving complex optimization problems that arise in many fields, such as meta-learning, multi-player games, reinforcement learning, and nested composition optimization.
no code implementations • 16 Oct 2023 • Zihao Li, Xiang Ji, Minshuo Chen, Mengdi Wang
In fact, human preference data are now used with classic reinforcement learning algorithms such as actor-critic methods, which involve evaluating an intermediate policy over a reward learned from human preference data with distribution shift, known as off-policy evaluation (OPE).
no code implementations • 29 Nov 2023 • Lei Zhao, Mengdi Wang, Yu Bai
Inverse Reinforcement Learning (IRL) -- the problem of learning reward functions from demonstrations of an \emph{expert policy} -- plays a critical role in developing intelligent systems.
no code implementations • 8 Jan 2024 • Joseph C. Kim, David Bloore, Karan Kapoor, Jun Feng, Ming-Hong Hao, Mengdi Wang
We demonstrate that standard architectures and training strategies, such as maximum likelihood alone, fail while our novel architecture and multi-stage training strategy are able to model the conformational distributions of protein G and HP35.
no code implementations • 8 Jan 2024 • Jiahao Qiu, Hui Yuan, Jinghong Zhang, Wentao Chen, Huazheng Wang, Mengdi Wang
To enhance the efficiency of such a process, we propose a tree search-based bandit learning method, which expands a tree starting from the initial sequence with the guidance of a bandit machine learning model.
no code implementations • 6 Feb 2024 • Efe Bozkir, Süleyman Özdel, Ka Hei Carrie Lau, Mengdi Wang, Hong Gao, Enkelejda Kasneci
Lastly, we speculate that combining the information provided to LLM-powered environments by the users and the biometric data obtained through the sensors might lead to novel privacy invasions.
no code implementations • 7 Feb 2024 • Boyi Wei, Kaixuan Huang, Yangsibo Huang, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, Peter Henderson
We develop methods to identify critical regions that are vital for safety guardrails, and that are disentangled from utility-relevant regions at both the neuron and rank levels.
no code implementations • 14 Feb 2024 • Souradip Chakraborty, Jiahao Qiu, Hui Yuan, Alec Koppel, Furong Huang, Dinesh Manocha, Amrit Singh Bedi, Mengdi Wang
Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.
no code implementations • 16 Feb 2024 • Zihao Li, Boyi Liu, Zhuoran Yang, Zhaoran Wang, Mengdi Wang
Designing algorithms for a constrained convex MDP faces several challenges, including (1) handling the large state space, (2) managing the exploration/exploitation tradeoff, and (3) solving the constrained optimization where the objective and the constraint are both nonlinear functions of the visitation measure.
no code implementations • 3 Mar 2024 • Yuchen Wu, Minshuo Chen, Zihao Li, Mengdi Wang, Yuting Wei
Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties.
no code implementations • 7 Mar 2024 • Zihao Li, Hui Lan, Vasilis Syrgkanis, Mengdi Wang, Masatoshi Uehara
In this paper, we study nonparametric estimation of instrumental variable (IV) regressions.
no code implementations • 17 Mar 2024 • Kaiyan Chang, Kun Wang, Nan Yang, Ying Wang, Dantong Jin, Wenlong Zhu, Zhirong Chen, Cangyuan Li, Hao Yan, Yunhao Zhou, Zhuoliang Zhao, Yuan Cheng, Yudong Pan, Yiqi Liu, Mengdi Wang, Shengwen Liang, Yinhe Han, Huawei Li, Xiaowei Li
Our 13B model (ChipGPT-FT) has a pass rate improvement compared with GPT-3. 5 in Verilog generation and outperforms in EDA script (i. e., SiliconCompiler) generation with only 200 EDA script data.
no code implementations • 18 Mar 2024 • Haque Ishfaq, Thanh Nguyen-Tang, Songtao Feng, Raman Arora, Mengdi Wang, Ming Yin, Doina Precup
We study offline multitask representation learning in reinforcement learning (RL), where a learner is provided with an offline dataset from different tasks that share a common representation and is asked to learn the shared representation.
no code implementations • 18 Mar 2024 • Hengyu Fu, Zhuoran Yang, Mengdi Wang, Minshuo Chen
Conditional diffusion models serve as the foundation of modern image synthesis and find extensive application in fields like computational biology and reinforcement learning.
no code implementations • 19 Mar 2024 • Xudong Guo, Kaixuan Huang, Jiale Liu, Wenhui Fan, Natalia Vélez, Qingyun Wu, Huazheng Wang, Thomas L. Griffiths, Mengdi Wang
Large Language Models (LLMs) have emerged as integral tools for reasoning, planning, and decision-making, drawing upon their extensive world knowledge and proficiency in language-related tasks.
no code implementations • 20 Mar 2024 • Zihao Li, Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Yinyu Ye, Minshuo Chen, Mengdi Wang
In this paper, we focus on diffusion models, a powerful generative AI technology, and investigate their potential for black-box optimization over complex structured variables.
no code implementations • 11 Apr 2024 • Minshuo Chen, Song Mei, Jianqing Fan, Mengdi Wang
In this paper, we review emerging applications of diffusion models, understanding their sample generation under various controls.