no code implementations • 18 Jul 2024 • Ally Yalei Du, Lin F. Yang, Ruosong Wang
In order to study whether the analog result is possible in the reinforcement learning setting, we consider the following problem: assuming the optimal $Q$-function is a $d$-dimensional linear function with sparsity $k$ and misspecification error $\epsilon$, whether we can obtain an $O\left(\epsilon\right)$-optimal policy using number of samples polynomially in the feature dimension $d$.
no code implementations • 26 Jun 2024 • Osama Hanna, Merve Karakas, Lin F. Yang, Christina Fragouli
We consider a novel multi-arm bandit (MAB) setup, where a learner needs to communicate the actions to distributed agents over erasure channels, while the rewards for the actions are directly available to the learner through external sensors.
no code implementations • 26 Jun 2024 • Tian Tian, Lin F. Yang, Csaba Szepesvári
The constrained Markov decision process (CMDP) framework emerges as an important reinforcement learning approach for imposing safety or other critical objectives while maximizing cumulative reward.
no code implementations • 28 May 2024 • Jialin Dong, Bahare Fatemi, Bryan Perozzi, Lin F. Yang, Anton Tsitsulin
Retrieval Augmented Generation (RAG) has greatly improved the performance of Large Language Model (LLM) responses by grounding generation with context from existing documents.
1 code implementation • 21 Dec 2023 • Osama A. Hanna, Merve Karakas, Lin F. Yang, Christina Fragouli
To our knowledge, these are the first algorithms capable of effectively learning through heterogeneous action erasure channels.
no code implementations • 7 Dec 2023 • Jiayi Huang, Han Zhong, LiWei Wang, Lin F. Yang
To tackle long planning horizon problems in reinforcement learning with general function approximation, we propose the first algorithm, termed as UCRL-WVTR, that achieves both \emph{horizon-free} and \emph{instance-dependent}, since it eliminates the polynomial dependency on the planning horizon.
no code implementations • 18 Sep 2023 • Haochen Zhang, Xi Chen, Lin F. Yang
The DRL policy aims to optimize trading fees earned by LPs against associated costs, such as gas fees and hedging expenses, which is referred to as loss-versus-rebalancing (LVR).
no code implementations • 11 Jul 2023 • Sanae Amani, Khushbu Pahwa, Vladimir Braverman, Lin F. Yang
Our research demonstrates that to achieve $\epsilon$-optimal policies for all $M$ tasks, a single agent using DistMT-LSVI needs to run a total number of episodes that is at most $\tilde{\mathcal{O}}({d^3H^6(\epsilon^{-2}+c_{\rm sep}^{-2})}\cdot M/N)$, where $c_{\rm sep}>0$ is a constant representing task separability, $H$ is the horizon of each episode, and $d$ is the feature dimension of the dynamics and rewards.
no code implementations • NeurIPS 2023 • Jiayi Huang, Han Zhong, LiWei Wang, Lin F. Yang
Our algorithm, termed as \textsc{Heavy-LSVI-UCB}, achieves the \emph{first} computationally efficient \emph{instance-dependent} $K$-episode regret of $\tilde{O}(d \sqrt{H \mathcal{U}^*} K^\frac{1}{1+\epsilon} + d \sqrt{H \mathcal{V}^* K})$.
no code implementations • 2 Jun 2023 • Masoud Monajatipoor, Liunian Harold Li, Mozhdeh Rouhsedaghat, Lin F. Yang, Kai-Wei Chang
In this paper, we study an interesting hypothesis: can we transfer the in-context learning ability from the language domain to VL domain?
no code implementations • 18 Apr 2023 • Dingwen Kong, Lin F. Yang
We provide an active-learning-based RL algorithm that first explores the environment without specifying a reward function and then asks a human teacher for only a few queries about the rewards of a task at some state-action pairs.
no code implementations • 29 Mar 2023 • Jialin Dong, Lin F. Yang
In particular, Du et al. (2020) show that even if a learner is given linear features in $\mathbb{R}^d$ that approximate the rewards in a bandit or RL with a uniform error of $\varepsilon$, searching for an $O(\varepsilon)$-optimal action requires pulling at least $\Omega(\exp(d))$ queries.
no code implementations • 1 Dec 2022 • Jinghan Wang, Mengdi Wang, Lin F. Yang
This work considers the sample complexity of obtaining an $\varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP), given access to a generative model (simulator).
no code implementations • 8 Nov 2022 • Osama A. Hanna, Lin F. Yang, Christina Fragouli
When the context distribution is unknown, we establish an algorithm that reduces the stochastic contextual instance to a sequence of linear bandit instances with small misspecifications and achieves nearly the same worst-case regret bound as the algorithm that solves the misspecified linear bandit instances.
no code implementations • 13 Jun 2022 • Sharan Vaswani, Lin F. Yang, Csaba Szepesvári
In particular, we design a model-based algorithm that addresses two settings: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to satisfy the constraint.
no code implementations • 8 Jun 2022 • Osama A. Hanna, Lin F. Yang, Christina Fragouli
Contextual linear bandits is a rich and theoretically important model that has many practical applications.
no code implementations • 1 Jun 2022 • Sanae Amani, Lin F. Yang, Ching-An Cheng
We study lifelong reinforcement learning (RL) in a regret minimization setting of linear contextual Markov decision process (MDP), where the agent needs to learn a multi-task policy while solving a streaming sequence of tasks.
no code implementations • 26 May 2022 • Sanae Amani, Tor Lattimore, András György, Lin F. Yang
In particular, for scenarios with known context distribution, the communication cost of DisBE-LUCB is only $\tilde{\mathcal{O}}(dN)$ and its regret is ${\tilde{\mathcal{O}}}(\sqrt{dNT})$, which is of the same order as that incurred by an optimal single-agent algorithm for $NT$ rounds.
no code implementations • 11 Nov 2021 • Osama A. Hanna, Lin F. Yang, Christina Fragouli
Existing works usually fail to address this issue and can become infeasible in certain applications.
no code implementations • 1 Nov 2021 • Yuanzhi Li, Ruosong Wang, Lin F. Yang
Notably, for an RL environment with horizon length $H$, previous work have shown that there is a probably approximately correct (PAC) algorithm that learns an $O(1)$-optimal policy using $\mathrm{polylog}(H)$ episodes of environment interactions when the number of states and actions is fixed.
no code implementations • NeurIPS 2021 • Han Zhong, Jiayi Huang, Lin F. Yang, LiWei Wang
Despite a large amount of effort in dealing with heavy-tailed error in machine learning, little is known when moments of the error can become non-existential: the random noise $\eta$ satisfies Pr$\left[|\eta| > |y|\right] \le 1/|y|^{\alpha}$ for some $\alpha > 0$.
no code implementations • 12 Oct 2021 • Weichao Mao, Lin F. Yang, Kaiqing Zhang, Tamer Başar
Multi-agent reinforcement learning (MARL) algorithms often suffer from an exponential sample complexity dependence on the number of agents, a phenomenon known as \emph{the curse of multiagents}.
no code implementations • 9 Oct 2021 • Junhong Shen, Lin F. Yang
To mitigate these issues, we propose a theoretically principled nearest neighbor (NN) function approximator that can improve the value networks in deep RL methods.
no code implementations • ICLR 2022 • Xiaoyu Chen, Jiachen Hu, Lin F. Yang, LiWei Wang
In particular, we take a plug-in solver approach, where we focus on learning a model in the exploration phase and demand that \emph{any planning algorithm} on the learned model can give a near-optimal policy.
Model-based Reinforcement Learning Reinforcement Learning (RL)
1 code implementation • 11 Aug 2021 • Jingfeng Wu, Vladimir Braverman, Lin F. Yang
In particular, for an unknown finite-horizon Markov decision process, the algorithm takes only $\widetilde{\mathcal{O}} (1/\epsilon \cdot (H^3SA / \rho + H^4 S^2 A) )$ episodes of exploration, and is able to obtain an $\epsilon$-optimal policy for a post-revealed reward with sub-optimality gap at least $\rho$, where $S$ is the number of states, $A$ is the number of actions, and $H$ is the length of the horizon, obtaining a nearly \emph{quadratic saving} in terms of $\epsilon$.
1 code implementation • 15 Jun 2021 • Haque Ishfaq, Qiwen Cui, Viet Nguyen, Alex Ayoub, Zhuoran Yang, Zhaoran Wang, Doina Precup, Lin F. Yang
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle.
no code implementations • 14 Jun 2021 • Dingwen Kong, Ruslan Salakhutdinov, Ruosong Wang, Lin F. Yang
For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $\propto\operatorname{poly}\log(K)$ times for running the RL algorithm for $K$ episodes while still achieving a small near-optimal regret bound.
no code implementations • 11 Jun 2021 • Jialin Dong, Da Zheng, Lin F. Yang, Geroge Karypis
This global cache allows in-GPU importance sampling of mini-batches, which drastically reduces the number of nodes in a mini-batch, especially in the input layer, to reduce data copy between CPU and GPU and mini-batch computation without compromising the training convergence rate or model accuracy.
no code implementations • 11 Jun 2021 • Sanae Amani, Christos Thrampoulidis, Lin F. Yang
Safety in reinforcement learning has become increasingly important in recent years.
1 code implementation • 22 Mar 2021 • Fei Feng, Wotao Yin, Alekh Agarwal, Lin F. Yang
Policy optimization methods remain a powerful workhorse in empirical Reinforcement Learning (RL), with a focus on neural policies that can easily reason over complex and continuous state and/or action spaces.
no code implementations • 25 Feb 2021 • Nived Rajaraman, Yanjun Han, Lin F. Yang, Kannan Ramchandran, Jiantao Jiao
We establish an upper bound $O(|\mathcal{S}|H^{3/2}/N)$ for the suboptimality using the Mimic-MD algorithm in Rajaraman et al (2020) which we prove to be computationally efficient.
no code implementations • 2 Jan 2021 • Minbo Gao, Tianle Xie, Simon S. Du, Lin F. Yang
This paper focuses on the linear Markov Decision Process (MDP) recently studied in [Yang et al 2019, Jin et al 2020] where the linear function approximation is used for generalization on the large state space.
no code implementations • 29 Nov 2020 • Qiwen Cui, Lin F. Yang
The empirical success of Multi-agent reinforcement learning is encouraging, while few theoretical guarantees have been revealed.
Multi-agent Reinforcement Learning reinforcement-learning +2
1 code implementation • NeurIPS 2021 • Jingfeng Wu, Vladimir Braverman, Lin F. Yang
We formalize this problem as an episodic learning problem on a Markov decision process, where transitions are unknown and a reward function is the inner product of a preference vector with pre-specified multi-objective reward functions.
Multi-Objective Reinforcement Learning reinforcement-learning
no code implementations • 3 Nov 2020 • Tianyu Wang, Lin F. Yang, Zizhuo Wang
In this paper, we consider a new Multi-Armed Bandit (MAB) problem where arms are nodes in an unknown and possibly changing graph, and the agent (i) initiates random walks over the graph by pulling arms, (ii) observes the random walk trajectories, and (iii) receives rewards equal to the lengths of the walks.
no code implementations • 3 Nov 2020 • Tianyu Wang, Lin F. Yang
Consequently, the sample complexity of our algorithm only depends on the rank, $m$, rather than the ambient dimension, $d$, which can be orders-of-magnitude larger.
no code implementations • NeurIPS 2020 • Qiwen Cui, Lin F. Yang
However, the understanding of the sample optimality of model-based RL is still largely missing, even for the linear case.
no code implementations • NeurIPS 2020 • Nived Rajaraman, Lin F. Yang, Jiantao Jiao, Kannan Ramachandran
Here, we show that the policy which mimics the expert whenever possible is in expectation $\lesssim \frac{|\mathcal{S}| H^2 \log (N)}{N}$ suboptimal compared to the value of the expert, even when the expert follows an arbitrary stochastic policy.
1 code implementation • ICML 2020 • Jingfeng Wu, Vladimir Braverman, Lin F. Yang
In sum, we obtain adjustable regularization for free for a large class of optimization problems and resolve an open question raised by Neu and Rosasco.
no code implementations • NeurIPS 2020 • Kaiqing Zhang, Sham M. Kakade, Tamer Başar, Lin F. Yang
This is in contrast to the usual reward-aware setting, with a $\tilde\Omega(|S|(|A|+|B|)(1-\gamma)^{-3}\epsilon^{-2})$ lower bound, where this model-based approach is near-optimal with only a gap on the $|A|,|B|$ dependence.
Model-based Reinforcement Learning Reinforcement Learning (RL)
no code implementations • NeurIPS 2020 • Ruosong Wang, Simon S. Du, Lin F. Yang, Ruslan Salakhutdinov
The sample complexity of our algorithm is polynomial in the feature dimension and the planning horizon, and is completely independent of the number of states and actions.
no code implementations • NeurIPS 2020 • Yichong Xu, Ruosong Wang, Lin F. Yang, Aarti Singh, Artur Dubrawski
If preferences are stochastic, and the preference probability relates to the hidden reward values, we present algorithms for PbRL, both with and without a simulator, that are able to identify the best policy up to accuracy $\varepsilon$ with high probability.
no code implementations • 16 Jun 2020 • Kunhe Yang, Lin F. Yang, Simon S. Du
This paper presents the first non-asymptotic result showing that a model-free algorithm can achieve a logarithmic cumulative regret for episodic tabular reinforcement learning if there exists a strictly positive sub-optimality gap in the optimal $Q$-function.
no code implementations • ICML 2020 • Alex Ayoub, Zeyu Jia, Csaba Szepesvari, Mengdi Wang, Lin F. Yang
We propose a model based RL algorithm that is based on optimism principle: In each episode, the set of models that are `consistent' with the data collected is constructed.
no code implementations • NeurIPS 2020 • Ruosong Wang, Ruslan Salakhutdinov, Lin F. Yang
Value function approximation has demonstrated phenomenal empirical success in reinforcement learning (RL).
no code implementations • 1 May 2020 • Ruosong Wang, Simon S. Du, Lin F. Yang, Sham M. Kakade
Our analysis introduces two ideas: (i) the construction of an $\varepsilon$-net for optimal policies whose log-covering number scales only logarithmically with the planning horizon, and (ii) the Online Trajectory Synthesis algorithm, which adaptively evaluates all policies in a given policy class using sample complexity that scales with the log-covering number of the given policy class.
1 code implementation • NeurIPS 2020 • Fei Feng, Ruosong Wang, Wotao Yin, Simon S. Du, Lin F. Yang
Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems [tang2017exploration, bellemare2016unifying], we investigate when this paradigm is provably efficient.
no code implementations • 23 Feb 2020 • Yingyu Liang, Zhao Song, Mengdi Wang, Lin F. Yang, Xin Yang
We show that our approach obtains small error and is efficient in both space and time.
no code implementations • 23 Feb 2020 • Gabriel I. Fernandez, Colin Togashi, Dennis W. Hong, Lin F. Yang
In this paper we propose a novel method that guarantees a stable region of attraction for the output of a policy trained in simulation, even for highly nonlinear systems.
no code implementations • 6 Dec 2019 • Fei Feng, Wotao Yin, Lin F. Yang
In particular, we provide an algorithm that uses $\widetilde{O}(N/(1-\gamma)^3/\varepsilon^2)$ samples in a generative model to learn an $\varepsilon$-optimal policy, where $\gamma$ is the discount factor and $N$ is the number of near-optimal actions in the approximate model.
no code implementations • 30 Oct 2019 • Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang
To our knowledge, this is first provably efficient algorithm to build a decoder in the continuous control setting.
no code implementations • ICLR 2020 • Simon S. Du, Sham M. Kakade, Ruosong Wang, Lin F. Yang
With regards to the statistical viewpoint, this question is largely unexplored, and the extant body of literature mainly focuses on conditions which permit sample efficient reinforcement learning with little understanding of what are necessary conditions for efficient reinforcement learning.
no code implementations • NeurIPS 2019 • Zhao Song, Ruosong Wang, Lin F. Yang, Hongyang Zhang, Peilin Zhong
When the loss function is a general symmetric norm, our algorithm produces a $\sqrt{d} \cdot \mathrm{polylog} n \cdot \mathrm{mmc}(\ell)$-approximate solution in input-sparsity time, where $\mathrm{mmc}(\ell)$ is a quantity related to the symmetric norm under consideration.
no code implementations • 25 Sep 2019 • Zeyu Jia, Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang
Modern complex sequential decision-making problem often both low-level policy and high-level planning.
no code implementations • 29 Aug 2019 • Aaron Sidford, Mengdi Wang, Lin F. Yang, Yinyu Ye
In this paper, we settle the sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors.
no code implementations • 10 Jun 2019 • Alekh Agarwal, Sham Kakade, Lin F. Yang
In this work, we study the effectiveness of the most natural plug-in approach to model-based planning: we build the maximum likelihood estimate of the transition model in the MDP from observations and then find an optimal policy in this empirical MDP.
Model-based Reinforcement Learning reinforcement-learning +2
no code implementations • 2 Jun 2019 • Zeyu Jia, Lin F. Yang, Mengdi Wang
Consider a two-player zero-sum stochastic game where the transition function can be embedded in a given feature space.
no code implementations • ICML 2020 • Lin F. Yang, Mengdi Wang
In this case, the kernelized MatrixRL satisfies a regret bound ${O}\big(H^2\widetilde{d}\log T\sqrt{T}\big)$, where $\widetilde{d}$ is the effective dimension of the kernel space.
1 code implementation • 5 May 2019 • Lin F. Yang, Chengzhuo Ni, Mengdi Wang
We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces.
no code implementations • 13 Feb 2019 • Lin F. Yang, Mengdi Wang
Consider a Markov decision process (MDP) that admits a set of state-action features, which can linearly express the process's probabilistic transition model.
no code implementations • 26 Dec 2018 • Yibo Lin, Zhao Song, Lin F. Yang
In this paper, we provide provable guarantees on some hashing-based parameter reduction methods in neural nets.
no code implementations • 13 Jun 2018 • Zhehui Chen, Xingguo Li, Lin F. Yang, Jarvis Haupt, Tuo Zhao
However, due to the lack of convexity, their landscape is not well understood and how to find the stable equilibria of the Lagrangian function is still unknown.
1 code implementation • 5 Jun 2018 • Aaron Sidford, Mengdi Wang, Xian Wu, Lin F. Yang, Yinyu Ye
In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time.
Optimization and Control
no code implementations • 26 Feb 2018 • Sham Kakade, Mengdi Wang, Lin F. Yang
There is a technical issue in the analysis that is not easily fixable.
no code implementations • 1 Feb 2018 • Wei Hu, Zhao Song, Lin F. Yang, Peilin Zhong
We consider the $k$-means clustering problem in the dynamic streaming setting, where points from a discrete Euclidean space $\{1, 2, \ldots, \Delta\}^d$ can be dynamically inserted to or deleted from the dataset.
no code implementations • 18 Dec 2017 • Zhuoran Yang, Lin F. Yang, Ethan X. Fang, Tuo Zhao, Zhaoran Wang, Matey Neykov
Existing nonconvex statistical optimization theory and methods crucially rely on the correct specification of the underlying "true" statistical models.
no code implementations • ICML 2017 • Zhehui Chen, Lin F. Yang, Chris Junchi Li, Tuo Zhao
Multiview representation learning is popular for latent factor analysis.
no code implementations • 19 Jun 2017 • Xingguo Li, Lin F. Yang, Jason Ge, Jarvis Haupt, Tong Zhang, Tuo Zhao
We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions.
no code implementations • 22 May 2017 • Lin F. Yang, Vladimir Braverman, Tuo Zhao, Mengdi Wang
We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm.
no code implementations • 27 Feb 2017 • Zhehui Chen, Lin F. Yang, Chris J. Li, Tuo Zhao
Multiview representation learning is very popular for latent factor analysis.
no code implementations • NeurIPS 2018 • Lin F. Yang, R. Arora, V. Braverman, Tuo Zhao
We use differential equations based approaches to provide some {\it \textbf{physics}} insights into analyzing the dynamics of popular optimization algorithms in machine learning.