Search Results for author: Long Yang

Found 17 papers, 4 papers with code

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

no code implementations24 May 2022 Linrui Zhang, Li Shen, Long Yang, Shixiang Chen, Bo Yuan, Xueqian Wang, DaCheng Tao

Safe reinforcement learning aims to learn the optimal policy while satisfying safety constraints, which is essential in real-world applications.

reinforcement-learning Safe Reinforcement Learning

A Review of Safe Reinforcement Learning: Methods, Theory and Applications

1 code implementation20 May 2022 Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, Yaodong Yang, Alois Knoll

To establish a good foundation for future research in this thread, in this paper, we provide a review for safe RL from the perspectives of methods, theory and applications.

Autonomous Driving Decision Making +2

CUP: A Conservative Update Policy Algorithm for Safe Reinforcement Learning

1 code implementation15 Feb 2022 Long Yang, Jiaming Ji, Juntao Dai, Yu Zhang, Pengfei Li, Gang Pan

Although using bounds as surrogate functions to design safe RL algorithms have appeared in some existing works, we develop them at least three aspects: (i) We provide a rigorous theoretical analysis to extend the surrogate functions to generalized advantage estimator (GAE).

reinforcement-learning Safe Exploration +1

Secure Hybrid Beamforming for IRS-Assisted Millimeter Wave Systems

no code implementations9 Jan 2022 Xuan Xue, Yongchao Wang, Long Yang, Jian Chen

In this paper, we investigate the secure beamforming design in an intelligent reflection surface (IRS) assisted millimeter wave (mmWave) system, where the hybrid beamforming (HB) and the passive beamforming (PB) are employed by the transmitter and the IRS, respectively.

Thompson Sampling for Unimodal Bandits

no code implementations15 Jun 2021 Long Yang, Zhao Li, Zehong Hu, Shasha Ruan, Shijian Li, Gang Pan, Hongyang Chen

In this paper, we propose a Thompson Sampling algorithm for \emph{unimodal} bandits, where the expected reward is unimodal over the partially ordered arms.

On Convergence of Gradient Expected Sarsa($λ$)

no code implementations14 Dec 2020 Long Yang, Gang Zheng, Yu Zhang, Qian Zheng, Pengfei Li, Gang Pan

We study the convergence of $\mathtt{Expected~Sarsa}(\lambda)$ with linear function approximation.

Sample Complexity of Policy Gradient Finding Second-Order Stationary Points

no code implementations2 Dec 2020 Long Yang, Qian Zheng, Gang Pan

However, due to the inherent non-concavity of its objective, convergence to a first-order stationary point (FOSP) can not guarantee the policy gradient methods finding a maximal point.

Policy Gradient Methods

Gradient Q$(σ, λ)$: A Unified Algorithm with Function Approximation for Reinforcement Learning

no code implementations6 Sep 2019 Long Yang, Yu Zhang, Qian Zheng, Pengfei Li, Gang Pan

To address above problem, we propose a GQ$(\sigma,\lambda)$ that extends tabular Q$(\sigma,\lambda)$ with linear function approximation.

Q-Learning reinforcement-learning

FiDi-RL: Incorporating Deep Reinforcement Learning with Finite-Difference Policy Search for Efficient Learning of Continuous Control

no code implementations1 Jul 2019 Longxiang Shi, Shijian Li, Longbing Cao, Long Yang, Gang Zheng, Gang Pan

Alternatively, derivative-based methods treat the optimization process as a blackbox and show robustness and stability in learning continuous control tasks, but not data efficient in learning.

Continuous Control reinforcement-learning

Expected Sarsa($λ$) with Control Variate for Variance Reduction

no code implementations25 Jun 2019 Long Yang, Yu Zhang, Jun Wen, Qian Zheng, Pengfei Li, Gang Pan

In this paper, for reducing the variance, we introduce control variate technique to $\mathtt{Expected}$ $\mathtt{Sarsa}$($\lambda$) and propose a tabular $\mathtt{ES}$($\lambda$)-$\mathtt{CV}$ algorithm.

TBQ($σ$): Improving Efficiency of Trace Utilization for Off-Policy Reinforcement Learning

no code implementations17 May 2019 Longxiang Shi, Shijian Li, Longbing Cao, Long Yang, Gang Pan

However, existing off-policy learning methods based on probabilistic policy measurement are inefficient when utilizing traces under a greedy target policy, which is ineffective for control problems.


Beetle Swarm Optimization Algorithm:Theory and Application

1 code implementation1 Aug 2018 Tiantian Wang, Long Yang

In this paper, a new meta-heuristic algorithm, called beetle swarm optimization algorithm, is proposed by enhancing the performance of swarm optimization through beetle foraging principles.

Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network

no code implementations14 Jun 2018 Wenjia Meng, Qian Zheng, Long Yang, Pengfei Li, Gang Pan

In this paper, we propose a general framework to combine DQN and most of the return-based reinforcement learning algorithms, named R-DQN.

OpenAI Gym reinforcement-learning

A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning

no code implementations9 Feb 2018 Long Yang, Minhao Shi, Qian Zheng, Wenjia Meng, Gang Pan

Results show that, with an intermediate value of $\sigma$, $Q(\sigma ,\lambda)$ creates a mixture of the existing algorithms that can learn the optimal value significantly faster than the extreme end ($\sigma=0$, or $1$).


Distinguishing the Indistinguishable: Exploring Structural Ambiguities via Geodesic Context

1 code implementation CVPR 2017 Qingan Yan, Long Yang, Ling Zhang, Chunxia Xiao

A perennial problem in structure from motion (SfM) is visual ambiguity posed by repetitive structures.

Cannot find the paper you are looking for? You can Submit a new open access paper.