no code implementations • 1 Jan 2023 • Wenhui Huang, Yanxin Zhou, Xiangkun He, Chen Lv
Despite some successful applications of goal-driven navigation, existing deep reinforcement learning-based approaches notoriously suffers from poor data efficiency issue.
no code implementations • 1 Jul 2022 • Jingda Wu, Wenhui Huang, Niels de Boer, Yanghui Mo, Xiangkun He, Chen Lv
Decisions made by human subjects in a driving simulator are treated as safe demonstrations, which are stored into the replay buffer and then utilized to enhance the training process of RL.
no code implementations • 20 Jun 2022 • Wenhui Huang, Cong Zhang, Jingda Wu, Xiangkun He, Jie Zhang, Chen Lv
We theoretically prove that the policy improvement theorem holds for the preference-guided $\epsilon$-greedy policy and experimentally show that the inferred action preference distribution aligns with the landscape of corresponding Q-values.
no code implementations • 1 Jan 2021 • Xiangkun He, Jianye Hao, Dong Li, Bin Wang, Wulong Liu
Thirdly, the agent’s learning process is regarded as a black-box, and the comprehensive metric we proposed is computed after each episode of training, then a Bayesian optimization (BO) algorithm is adopted to guide the agent to evolve towards improving the quality of the approximated Pareto frontier.
Multi-Objective Reinforcement Learning
reinforcement-learning