no code implementations • 24 May 2022 • Linrui Zhang, Li Shen, Long Yang, Shixiang Chen, Bo Yuan, Xueqian Wang, DaCheng Tao
Safe reinforcement learning aims to learn the optimal policy while satisfying safety constraints, which is essential in real-world applications.
1 code implementation • 20 May 2022 • Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, Yaodong Yang, Alois Knoll
To establish a good foundation for future research in this thread, in this paper, we provide a review for safe RL from the perspectives of methods, theory and applications.
1 code implementation • 15 Feb 2022 • Long Yang, Jiaming Ji, Juntao Dai, Yu Zhang, Pengfei Li, Gang Pan
Although using bounds as surrogate functions to design safe RL algorithms have appeared in some existing works, we develop them at least three aspects: (i) We provide a rigorous theoretical analysis to extend the surrogate functions to generalized advantage estimator (GAE).
no code implementations • 9 Jan 2022 • Xuan Xue, Yongchao Wang, Long Yang, Jian Chen
In this paper, we investigate the secure beamforming design in an intelligent reflection surface (IRS) assisted millimeter wave (mmWave) system, where the hybrid beamforming (HB) and the passive beamforming (PB) are employed by the transmitter and the IRS, respectively.
no code implementations • 15 Jun 2021 • Long Yang, Zhao Li, Zehong Hu, Shasha Ruan, Shijian Li, Gang Pan, Hongyang Chen
In this paper, we propose a Thompson Sampling algorithm for \emph{unimodal} bandits, where the expected reward is unimodal over the partially ordered arms.
no code implementations • 14 Dec 2020 • Long Yang, Gang Zheng, Yu Zhang, Qian Zheng, Pengfei Li, Gang Pan
We study the convergence of $\mathtt{Expected~Sarsa}(\lambda)$ with linear function approximation.
no code implementations • 2 Dec 2020 • Long Yang, Qian Zheng, Gang Pan
However, due to the inherent non-concavity of its objective, convergence to a first-order stationary point (FOSP) can not guarantee the policy gradient methods finding a maximal point.
no code implementations • 6 Sep 2019 • Long Yang, Yu Zhang, Qian Zheng, Pengfei Li, Gang Pan
To address above problem, we propose a GQ$(\sigma,\lambda)$ that extends tabular Q$(\sigma,\lambda)$ with linear function approximation.
no code implementations • 1 Jul 2019 • Longxiang Shi, Shijian Li, Longbing Cao, Long Yang, Gang Zheng, Gang Pan
Alternatively, derivative-based methods treat the optimization process as a blackbox and show robustness and stability in learning continuous control tasks, but not data efficient in learning.
no code implementations • 25 Jun 2019 • Long Yang, Yu Zhang, Gang Zheng, Qian Zheng, Pengfei Li, Jianhang Huang, Jun Wen, Gang Pan
Improving sample efficiency has been a longstanding goal in reinforcement learning.
no code implementations • 25 Jun 2019 • Long Yang, Yu Zhang, Jun Wen, Qian Zheng, Pengfei Li, Gang Pan
In this paper, for reducing the variance, we introduce control variate technique to $\mathtt{Expected}$ $\mathtt{Sarsa}$($\lambda$) and propose a tabular $\mathtt{ES}$($\lambda$)-$\mathtt{CV}$ algorithm.
no code implementations • 17 May 2019 • Longxiang Shi, Shijian Li, Longbing Cao, Long Yang, Gang Pan
However, existing off-policy learning methods based on probabilistic policy measurement are inefficient when utilizing traces under a greedy target policy, which is ineffective for control problems.
1 code implementation • 1 Aug 2018 • Tiantian Wang, Long Yang
In this paper, a new meta-heuristic algorithm, called beetle swarm optimization algorithm, is proposed by enhancing the performance of swarm optimization through beetle foraging principles.
no code implementations • 14 Jun 2018 • Wenjia Meng, Qian Zheng, Long Yang, Pengfei Li, Gang Pan
In this paper, we propose a general framework to combine DQN and most of the return-based reinforcement learning algorithms, named R-DQN.
no code implementations • CVPR 2018 • Yanping Fu, Qingan Yan, Long Yang, Jie Liao, Chunxia Xiao
Acquiring realistic texture details for 3D models is important in 3D reconstruction.
no code implementations • 9 Feb 2018 • Long Yang, Minhao Shi, Qian Zheng, Wenjia Meng, Gang Pan
Results show that, with an intermediate value of $\sigma$, $Q(\sigma ,\lambda)$ creates a mixture of the existing algorithms that can learn the optimal value significantly faster than the extreme end ($\sigma=0$, or $1$).
1 code implementation • CVPR 2017 • Qingan Yan, Long Yang, Ling Zhang, Chunxia Xiao
A perennial problem in structure from motion (SfM) is visual ambiguity posed by repetitive structures.