no code implementations • 8 Jul 2024 • Yan Xia, Ran Ding, Ziyuan Qin, Guanqi Zhan, Kaichen Zhou, Long Yang, Hao Dong, Daniel Cremers
3) We also generate a large-scale training dataset via a scalable pipeline, which can be used to boost the performance of grasping under occlusion and generalized to the real world.
no code implementations • 4 May 2024 • Wenjia Meng, Qian Zheng, Long Yang, Yilong Yin, Gang Pan
In this paper, we propose an off-policy policy gradient method with the optimal action-dependent baseline (Off-OAB) to mitigate this variance issue.
1 code implementation • 19 Apr 2024 • Tianfu Wang, Qilin Fan, Chao Wang, Long Yang, Leilei Ding, Nicholas Jing Yuan, Hui Xiong
In this paper, we propose a FLexible And Generalizable RL framework for VNE, named FlagVNE.
no code implementations • 14 Mar 2024 • Dali Zhu, Wenli Zhang, Hualin Zeng, Xiaohao Liu, Long Yang, Jiaqi Zheng
Remote photoplethysmography (rPPG) technique extracts blood volume pulse (BVP) signals from subtle pixel changes in video frames.
1 code implementation • 2023 IEEE International Conference on Multimedia and Expo (ICME) 2023 • Gao Rui, Wan Fan, Organisciak Daniel, Pu Jiyao, Duan Haoran, Zhang Peng, Hou Xingsong, Long Yang
Besides, we provide the discussion of teacher model in both omniscient and quasi-omniscient settings according to the knowledge space.
no code implementations • 5 Jun 2023 • Long Yang
In this lecture, we present a general perspective on reinforcement learning (RL) objectives, where we show three versions of objectives.
1 code implementation • 22 May 2023 • Long Yang, Zhixiong Huang, Fenghao Lei, Yucun Zhong, Yiming Yang, Cong Fang, Shiting Wen, Binbin Zhou, Zhouchen Lin
Popular reinforcement learning (RL) algorithms tend to produce a unimodal policy distribution, which weakens the expressiveness of complicated policy and decays the ability of exploration.
3 code implementations • 15 Sep 2022 • Long Yang, Jiaming Ji, Juntao Dai, Linrui Zhang, Binbin Zhou, Pengfei Li, Yaodong Yang, Gang Pan
Compared to previous safe RL methods, CUP enjoys the benefits of 1) CUP generalizes the surrogate functions to generalized advantage estimator (GAE), leading to strong empirical performance.
no code implementations • 24 May 2022 • Linrui Zhang, Li Shen, Long Yang, Shixiang Chen, Bo Yuan, Xueqian Wang, DaCheng Tao
Safe reinforcement learning aims to learn the optimal policy while satisfying safety constraints, which is essential in real-world applications.
1 code implementation • 20 May 2022 • Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, Alois Knoll
To establish a good foundation for future safe RL research, in this paper, we provide a review of safe RL from the perspectives of methods, theories, and applications.
1 code implementation • 15 Feb 2022 • Long Yang, Jiaming Ji, Juntao Dai, Yu Zhang, Pengfei Li, Gang Pan
Although using bounds as surrogate functions to design safe RL algorithms have appeared in some existing works, we develop them at least three aspects: (i) We provide a rigorous theoretical analysis to extend the surrogate functions to generalized advantage estimator (GAE).
no code implementations • 9 Jan 2022 • Long Yang, Jiangtao Wang, Xuan Xue, Jia Shi, Yongchao Wang
In this paper, we investigate the secure beamforming design in an intelligent reflection surface (IRS) assisted millimeter wave (mmWave) system, where the hybrid beamforming (HB) and the passive beamforming (PB) are employed by the transmitter and the IRS, respectively.
no code implementations • 15 Jun 2021 • Long Yang, Zhao Li, Zehong Hu, Shasha Ruan, Shijian Li, Gang Pan, Hongyang Chen
In this paper, we propose a Thompson Sampling algorithm for \emph{unimodal} bandits, where the expected reward is unimodal over the partially ordered arms.
no code implementations • 14 Dec 2020 • Long Yang, Gang Zheng, Yu Zhang, Qian Zheng, Pengfei Li, Gang Pan
We study the convergence of $\mathtt{Expected~Sarsa}(\lambda)$ with linear function approximation.
no code implementations • 2 Dec 2020 • Long Yang, Qian Zheng, Gang Pan
However, due to the inherent non-concavity of its objective, convergence to a first-order stationary point (FOSP) can not guarantee the policy gradient methods finding a maximal point.
no code implementations • 6 Sep 2019 • Long Yang, Yu Zhang, Qian Zheng, Pengfei Li, Gang Pan
To address above problem, we propose a GQ$(\sigma,\lambda)$ that extends tabular Q$(\sigma,\lambda)$ with linear function approximation.
no code implementations • 1 Jul 2019 • Longxiang Shi, Shijian Li, Longbing Cao, Long Yang, Gang Zheng, Gang Pan
Alternatively, derivative-based methods treat the optimization process as a blackbox and show robustness and stability in learning continuous control tasks, but not data efficient in learning.
no code implementations • 25 Jun 2019 • Long Yang, Yu Zhang, Jun Wen, Qian Zheng, Pengfei Li, Gang Pan
In this paper, for reducing the variance, we introduce control variate technique to $\mathtt{Expected}$ $\mathtt{Sarsa}$($\lambda$) and propose a tabular $\mathtt{ES}$($\lambda$)-$\mathtt{CV}$ algorithm.
no code implementations • 25 Jun 2019 • Long Yang, Yu Zhang, Gang Zheng, Qian Zheng, Pengfei Li, Jianhang Huang, Jun Wen, Gang Pan
Improving sample efficiency has been a longstanding goal in reinforcement learning.
no code implementations • 17 May 2019 • Longxiang Shi, Shijian Li, Longbing Cao, Long Yang, Gang Pan
However, existing off-policy learning methods based on probabilistic policy measurement are inefficient when utilizing traces under a greedy target policy, which is ineffective for control problems.
1 code implementation • 1 Aug 2018 • Tiantian Wang, Long Yang
In this paper, a new meta-heuristic algorithm, called beetle swarm optimization algorithm, is proposed by enhancing the performance of swarm optimization through beetle foraging principles.
no code implementations • 14 Jun 2018 • Wenjia Meng, Qian Zheng, Long Yang, Pengfei Li, Gang Pan
In this paper, we propose a general framework to combine DQN and most of the return-based reinforcement learning algorithms, named R-DQN.
no code implementations • CVPR 2018 • Yanping Fu, Qingan Yan, Long Yang, Jie Liao, Chunxia Xiao
Acquiring realistic texture details for 3D models is important in 3D reconstruction.
no code implementations • 9 Feb 2018 • Long Yang, Minhao Shi, Qian Zheng, Wenjia Meng, Gang Pan
Results show that, with an intermediate value of $\sigma$, $Q(\sigma ,\lambda)$ creates a mixture of the existing algorithms that can learn the optimal value significantly faster than the extreme end ($\sigma=0$, or $1$).
1 code implementation • CVPR 2017 • Qingan Yan, Long Yang, Ling Zhang, Chunxia Xiao
A perennial problem in structure from motion (SfM) is visual ambiguity posed by repetitive structures.