no code implementations • 11 Jun 2024 • Qining Zhang, Honghao Wei, Lei Ying

In this paper, we study reinforcement learning from human feedback (RLHF) under an episodic Markov decision process with a general trajectory-wise reward model.

no code implementations • 23 May 2024 • Minheng Xiao, Xian Yu, Lei Ying

However, developing policy gradient methods for risk-sensitive DRL is inherently more complex as it pertains to finding the gradient of a probability measure.

Distributional Reinforcement Learning
Policy Gradient Methods
**+1**

no code implementations • 17 Mar 2024 • Zixian Yang, Lei Ying

We prove that our proposed algorithm yields a sublinear regret $\tilde{O}(T^{5/6})$ and queue-length bound $\tilde{O}(T^{2/3})$, where $T$ is the time horizon.

no code implementations • 26 Feb 2024 • Kellen Kanarios, Qining Zhang, Lei Ying

In this paper, we study a best arm identification problem with dual objects.

no code implementations • 22 Dec 2023 • Honghao Wei, Xin Liu, Lei Ying

This paper studies safe Reinforcement Learning (safe RL) with linear function approximation and under hard instantaneous constraints where unsafe actions must be avoided at each step.

no code implementations • 27 Sep 2023 • Zihan Zhou, Honghao Wei, Lei Ying

PRI achieves trio objectives: (i) PRI is a model-free algorithm; and (ii) it outputs an approximately optimal policy with a high probability at the end of learning; and (iii) PRI guarantees $\tilde{\mathcal{O}}(H\sqrt{K})$ regret and constraint violation, which significantly improves the best existing regret bound $\tilde{\mathcal{O}}(H^4 \sqrt{SA}K^{\frac{4}{5}})$ under a model-free algorithm, where $H$ is the length of each episode, $S$ is the number of states, $A$ is the number of actions, and the total number of episodes during learning is $2K+\tilde{\cal O}(K^{0. 25}).$ We further present a matching lower via an example that shows under any online learning algorithm, there exists a well-separated CMDP instance such that either the regret or violation has to be $\Omega(H\sqrt{K}),$ which matches the upper bound by a polylogarithmic factor.

no code implementations • NeurIPS 2023 • Qining Zhang, Lei Ying

This paper considers a stochastic Multi-Armed Bandit (MAB) problem with dual objectives: (i) quick identification and commitment to the optimal arm, and (ii) reward maximization throughout a sequence of $T$ consecutive rounds.

1 code implementation • 1 Jun 2023 • Ruizhong Qiu, Dingsu Wang, Lei Ying, H. Vincent Poor, Yifang Zhang, Hanghang Tong

They are exclusively based on the maximum likelihood estimation (MLE) formulation and require to know true diffusion parameters.

no code implementations • 10 Mar 2023 • Honghao Wei, Arnob Ghosh, Ness Shroff, Lei Ying, Xingyu Zhou

We study model-free reinforcement learning (RL) algorithms in episodic non-stationary constrained Markov Decision Processes (CMDPs), in which an agent aims to maximize the expected cumulative reward subject to a cumulative constraint on the expected utility (cost).

no code implementations • 5 Feb 2023 • Xin Liu, Zixian Yang, Lei Ying

This subroutine also achieves the state-of-the-art regret and constraint violation bounds for constrained online convex optimization problems, which is of independent interest.

no code implementations • 26 Jan 2023 • Xian Yu, Lei Ying

Risk-sensitive reinforcement learning (RL) has become a popular tool to control the risk of uncertain outcomes and ensure reliable performance in various sequential decision-making problems.

no code implementations • 4 Jan 2023 • Kaiyi Ji, Lei Ying

In this paper, we provide a new solution using a distributed and data-driven bilevel optimization approach, where the lower level is a distributed network utility maximization (NUM) algorithm with concave surrogate utility functions, and the upper level is a data-driven learning algorithm to find the best surrogate utility functions that maximize the sum of true network utility.

no code implementations • 13 Dec 2022 • Xin Liu, Honghao Wei, Lei Ying

The proposed algorithm is distributed in two aspects: (i) the learned policy is a distributed policy that maps a local state of an agent to its local action and (ii) the learning/training is distributed, during which each agent updates its policy based on its own and neighbors' information.

Multi-agent Reinforcement Learning
reinforcement-learning
**+1**

no code implementations • 2 Sep 2022 • Zixian Yang, R. Srikant, Lei Ying

We prove that under our algorithm the asymptotic average queue length is bounded by one divided by the traffic slackness, which is order-wise optimal.

no code implementations • 27 May 2022 • Kaiyi Ji, Mingrui Liu, Yingbin Liang, Lei Ying

Existing studies in the literature cover only some of those implementation choices, and the complexity bounds available are not refined enough to enable rigorous comparison among different implementations.

no code implementations • 26 May 2022 • Zixian Yang, Xin Liu, Lei Ying

To understand the exploration, exploitation, and engagement in these systems, we propose a new model, called MAB-A where "A" stands for abandonment and the abandonment probability depends on the current recommended item and the user's past experience (called state).

no code implementations • 13 Nov 2021 • Jueming Hu, Xuxi Yang, Weichang Wang, Peng Wei, Lei Ying, Yongming Liu

Obstacle avoidance for small unmanned aircraft is vital for the safety of future urban air mobility (UAM) and Unmanned Aircraft System (UAS) Traffic Management (UTM).

no code implementations • 3 Jun 2021 • Honghao Wei, Xin Liu, Lei Ying

This paper presents the first model-free, simulator-free reinforcement learning algorithm for Constrained Markov Decision Processes (CMDPs) with sublinear regret and zero constraint violation.

no code implementations • NeurIPS 2021 • Xin Liu, Bin Li, Pengyi Shi, Lei Ying

Thus, the overall computational complexity of our algorithm is similar to that of the linear UCB for unconstrained stochastic linear bandits.

no code implementations • 20 Oct 2020 • Xin Liu, Bin Li, Pengyi Shi, Lei Ying

This paper considers constrained online dispatching with unknown arrival, reward and constraint distributions.

2 code implementations • 4 Oct 2020 • Honghao Wei, Lei Ying

In this paper, we propose a new type of Actor, named forward-looking Actor or FORK for short, for Actor-Critic algorithms.

1 code implementation • NeurIPS 2020 • Wentao Weng, Harsh Gupta, Niao He, Lei Ying, R. Srikant

In this paper, we establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning.

1 code implementation • NeurIPS 2019 • Harsh Gupta, R. Srikant, Lei Ying

We study two time-scale linear stochastic approximation algorithms, which can be used to model well-known reinforcement learning algorithms such as GTD, GTD2, and TDC.

no code implementations • 4 Mar 2019 • Honghao Wei, Xiaohan Kang, Weina Wang, Lei Ying

The algorithm consists of an offline machine learning algorithm for learning the probabilistic information spreading model and an online optimal stopping algorithm to detect misinformation.

no code implementations • 3 Feb 2019 • R. Srikant, Lei Ying

We consider the dynamics of a linear stochastic approximation algorithm driven by Markovian noise, and derive finite-time bounds on the moments of the error, i. e., deviation of the output of the algorithm from the equilibrium point of an associated ordinary differential equation (ODE).

no code implementations • 6 Mar 2014 • Kai Zhu, Rui Wu, Lei Ying, R. Srikant

In particular, we consider both the clustering model, where only users (or items) are clustered, and the co-clustering model, where both users and items are clustered, and further, we assume that some users rate many items (information-rich users) and some users rate only a few items (information-sparse users).

no code implementations • 1 Oct 2013 • Jiaming Xu, Rui Wu, Kai Zhu, Bruce Hajek, R. Srikant, Lei Ying

In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.